1 / 18

Gazetteer database

Gazetteer database. 4.5 million items, each having: 1+ names fair to good discriminator 1 geospatial footprint 99.9% points, 0.1% boxes & polygons anticipate getting more (overlapping) polygons 1+ types excellent to horrible discriminator No obvious way to partition them.

kosey
Download Presentation

Gazetteer database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gazetteer database • 4.5 million items, each having: • 1+ names • fair to good discriminator • 1 geospatial footprint • 99.9% points, 0.1% boxes & polygons • anticipate getting more (overlapping) polygons • 1+ types • excellent to horrible discriminator • No obvious way to partition them Greg Janée • chit-chat with CS database folks • 10/26/01

  2. Gazetteer database • Informix Dynamic Server 2000 • Indexes • names: Verity (external; blade interface) • footprint: MapInfo (R-tree; blade interface) • types: B-tree Greg Janée • chit-chat with CS database folks • 10/26/01

  3. Gazetteer database • Queries • dynamic • general case: arbitrary boolean combinations • in practice: • [name] AND [footprint] AND [([type] OR [type] OR ...)] • Desired behavior • see some results immediately  queries • after seeing some results, ability to kill query Greg Janée • chit-chat with CS database folks • 10/26/01

  4. Gazetteer database • Observed behavior: • Either: query answered quickly (< 2 minutes) • optimizer picks “right” index • Or: query takes forever (> 30 minutes) • optimizer picks “wrong” index • Complications: • no way to kill or interrupt database thread, through JDBC or otherwise Greg Janée • chit-chat with CS database folks • 10/26/01

  5. Challenge #1 • Solve ADEPT’s query problem • Multiple, different data types • spatial, text, traditional linear types • Discriminability of any given index greatly depends on both data & query • Dynamic queries Greg Janée • chit-chat with CS database folks • 10/26/01

  6. Load balancing • Why • distribute those queries-from-hell • increase reliability • How • multiple independent, identical databases • middleware directs query to “best” database • database executes query in its entirety Greg Janée • chit-chat with CS database folks • 10/26/01

  7. Load balancing • What the middleware knows about a database: • current & maximum connection count • current queries • amount of time each query has been processing (QPT) • Idea: • score each DB inversely proportional to QPT2 Greg Janée • chit-chat with CS database folks • 10/26/01

  8. Challenge #2 • Define metric • overall query processing time • response time • Do better • incorporate connection counts into formula • analyze queries • keep history Greg Janée • chit-chat with CS database folks • 10/26/01

  9. Query translation • Gazetteer is an instance of a much more general problem • To wit: • how to describe the automatic translation of dynamic queries written in an abstract query language to SQL • in an easy, powerful, flexible way • making as few assumptions as possible about the underlying schema • and producing “reasonable” SQL • not so bad as to preclude database’s optimizer from working Greg Janée • chit-chat with CS database folks • 10/26/01

  10. Query translation • Collection ( database) contains items having IDs • query should return IDs; duplicates OK • response time more important than overall QPT • Search bucket • abstract, typed thing against which constraints may be placed;  standard buckets • Each collection supports 1+ buckets in idiosyncratic ways • Query language • arbitrary boolean combinations of bucket constraints Greg Janée • chit-chat with CS database folks • 10/26/01

  11. Query translation • Standard buckets • subject-related text • title • assigned term • originator • geographic location • coverage date • object type, feature type, ... • format • identifier Greg Janée • chit-chat with CS database folks • 10/26/01

  12. Query translation • Buckets grouped by types • spatial • e.g., overlaps a given polygon • temporal • e.g., contains a given date range • hierarchical • e.g., is any kind of geographic work • textual, qualified textual • e.g., contains the phrase “luis obispo” • numeric • e.g., > 5.7 meters Greg Janée • chit-chat with CS database folks • 10/26/01

  13. Query translation • Example: translating a spatial bucket • MapInfo datablade • ST_Contains(table.column, “HG_Box(coords)”) • Geodetic datablade • Inside(“GeoBox(coords)”, table.column) • four bounding coordinate columns • table.northcolumn >= ... and table.southcolumn <= ... and ... Greg Janée • chit-chat with CS database folks • 10/26/01

  14. Query translation • Existing Python-based scripting system • easy to configure & extend • comes with library of standard translation techniques • "geographic-location" :Bucket( "spatial", standardSpatialOperators,spatialToInformixMapInfo, ["j_holding", "footprint"]) • How to extend to boolean queries? Greg Janée • chit-chat with CS database folks • 10/26/01

  15. Query translation • Too-easy solution: • given constraints C1, C2 that translate into SQL constraints (T1, S1), (T2, S2) then constraintC1op C2 where op is AND, OR, or AND NOT becomes • select id from T1 where S1op id in (select id from T2 where C2) • But: Informix appears to execute subqueries in their entirety before considering the outer query Greg Janée • chit-chat with CS database folks • 10/26/01

  16. Query translation • The problems with JOINs • handling ANDs • tables may have {1, 1?, 1+, 0+} rows per item • handling ORs • may require UNION • but UNION is not nestable Greg Janée • chit-chat with CS database folks • 10/26/01

  17. Query translation • The problem with disjunctive normal form: • may be inefficient • select id from table where S1 and (S2 or S3) • versus • select id from table where S1 and S2unionselect id from table where S1 and S3 Greg Janée • chit-chat with CS database folks • 10/26/01

  18. Challenge #3 • Design a translation description system • easy to configure & extend • makes as few assumptions as possible about the underlying schema • should produce reasonable SQL by default • supports customization of translation process • supports pattern-based overrides Greg Janée • chit-chat with CS database folks • 10/26/01

More Related