1 / 28

Query Formation From High-Level Concepts for Relational Databases

Query Formation From High-Level Concepts for Relational Databases. Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University of California Los Angeles, CA http://www.cobase.cs.ucla.edu. Outlines. Overview Semantic Graph Model

bobby
Download Presentation

Query Formation From High-Level Concepts for Relational Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Formation From High-LevelConcepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University of California Los Angeles, CA http://www.cobase.cs.ucla.edu

  2. Outlines • Overview • Semantic Graph Model • High-Level Query Formation for SPJ queries • Incremental Query Formation for Complex Queries • Conclusions

  3. Overview: Query Formation • Based on semantic graph model, including user-defined relationships • User specifies requests and constraints • Formulate simple query by graph search technique • Candidates ranked by information measure • English-like query description • A complex query can be formulated by a series of simple queries

  4. Related Work • Query formulation as Steiner tree problem (Wald and Sorenson, 1984) • limited to partial 2-tree graphs • Formulate simple Select-Project-Join (SPJ) queries via Universal Relation Model: no need to specify natural joins (Ullman 1988, Vardi, 1988) • Object-oriented query path expression completion: partial order relationship between different path for ranking (Ioannidis and Lashkari, 1994) • Query-by-Icon (QBI) [Massari and Chrysanthis, 1995] • Natural language interfaces (text/voice): logical form to query

  5. Semantic Graph Model • Weighted graph G=(V,E): • Nodes: entities -- strong, weak, user-defined • Links: relationships -- ISA, HAS, simple, complex, user-defined • For relational databases: • nodes: relations • links: natural and user-defined joins • Weight: information measure of a node or link

  6. Semantic Graph Example

  7. Query Feature • Query expression in a semantic graph • Query Topic, T: A set of Joins represented by links • Query Constraints, C: Query Conditions • Query Aspect, A: Attribute list

  8. A query topic for “aircraft can land on airports at geographical locations of countries” can land airfield_chars runways have located is a geoloc country airports

  9. Semi-Automatic Generation of Semantic Model • Find natural joins through key and foreign key between nodes. • User-defined links can be added into the graph model. • Designers need to specify link types and assign names to all the elements in the graph.

  10. Example of SemanticModel Generation AIRPORT: APORT_NM, GEOLOC_TYPE, GLC_CD, ELEV_FT, …; key: APORT_NM. RUNWAY: APORT_NM, RUNWAY_NM, GLC_CD, RUNWAY_LENGTH_FT, RUNWAY_WIDTH_FT, …; key: RUNWAY_NM. GEOLOC: GLC_CD, GLC_NM, CY_CD, LATITUDE, LONGITUDE, …; key: GLC_CD. COUNTRY: CY_CD, CY_NM, …; key: CY_CD. Links: AIRPORT--RUNWAY: APORT_NM; AIRPORT--GEOLOC: GLC_CD; RUNWAY--GEOLOC: GLC_CD; GEOLOC--COUNTRY: CY_CD;

  11. Information Measure Information measure of a node or link, a I(a) = - log P(a) where P(a) is the probability of a being used in queries. Assume nodes and links are independent, for a subgraph with a set of elements A={ai | i = 1, …, n}, information measure is additive: n I(A) = SUM I(ai) i = 1

  12. Information Measure(cont.) Initial Information Measure: all the nodes = 1 different nodes have a different value Information measure is normalized and converted into counts Probability of a node or a link is P(ai) = ci/c • Update Information measure • Ranking based on Information measure, thus adapt to user feedback

  13. Query Formulation To formulate (simple) queries without knowledge of query language or database schema Example: Find airports in Tunisia that can land a C-5 cargo plane User input: Query aspect: AIRPORTS.APORT_NM Constraints: AIRCRAFT_AIRFIELD_CHARS.AC_TYPE_NAME = ‘C-5’ COUNTRY_STATE.CY_NM = ‘Tunisia’ Links: CAN LAND

  14. Formulated Query SELECT R3.APORT_NM FROM AIRCRAFT_AIRFIELD_CHARS R0 AIRPORTS R3, COUNTRY_STATE R11 GEOLOC R12, RUNWAYS R16 WHERE R0.AC_TYPE_NM = ‘C-5’ AND R11.CY_NM = ‘Tunisia’ AND R0.WT_MIN_AVG_LAND_DIST_FT <= R16.RUNWAY_LENGTH-FT AND R0.WT_MIN_RUNWAY_WIDTH_FT <= R16.RUNWAY_WIDTH_FT AND R11.GLC_CD = R3. GLC_CD AND R3.APORT_NM = R16.APORT_NM AND R11.CY_CD = R11.CY_CD

  15. Query Completion as GraphSearch Problem Given: An incomplete input query topic Ti Find a set of links to complete the topic (to make Ti connected) Minimum Missing Information principle: The query completion candidate Tc (the missing links and nodes) for an incomplete input topic Ti contains the minimum information

  16. Query Formulation Algorithm • Input: subgraph T of the semantic graph G • Find candidates with the minimum Information measure • Two methods used to limit the search scope: • L-step-bound paths: paths that connect two components with at most L links, to limit search within the neighborhood of the input subgraph • k-minimum completion candidates: only at most k candidates with minimum Information measure are kept (alpha-beta pruning)

  17. Initial Components and 2-Step-BoundPaths For the “CAN LAND” Query repair (1) airfield_chars airports 2 can land have authorize airfield_chars runways (2) airfield_chars aircrafts airports 1 2 have runways airports (3) 1 airports at is a (4) airports runways geoloc 1 1 at located country (5) runways country geoloc 1 1 is a located (6) airports country geoloc 1 1 (a) Initial components (b) 2-step-bound paths

  18. The Semantic Graph For theTransportation Domain at can land airfield_chars runways 1 2 1 have located is a airports geoloc country 1 1 Relation Node weather

  19. Incremental Query Formulation • Incremental Query Formulation • To assist user reach a complex query goal with a series of simple queries • The subsequent queries may depend on results of preceding queries (derived relations) • Issues • Incorporate derived relations into the semantic graph • Suggest missing attributes to link isolated derived nodes to the graph

  20. Incremental Query Examples • Find airports in Tunisia. • Which of these airports can land a C-5? • What is the weather at these airports?

  21. Incorporating Derived Relations • Source relation: contributes attributes to the derived relations • Derived relation: inherits properties of attributes from their source relations • Deriving link: links to the source relations through inherited keys • Inherited link: inherits links from the source relations

  22. airporttunisiacanland airporttunisiacanlandweather Extended semantic graph showing derived nodes, derived links and inherited links at can land airfield_chars runways 1 2 1 have located is a airports geoloc country 1 1 airporttunisia weather Relation Node Derived Node Derived Link Inherited Link

  23. Suggesting KeyAttributes for a Query • Find source relations for the isolated derived relation. • Suggest key of the source relations as attributes to include.

  24. Concept and AttributeSpecification Interface

  25. Query Constraint Specification

  26. Action Specification

  27. English-Like Query Descriptionand the Formulated Query

  28. Conclusions • Semantic graph model provides a basis for query formulation search • Ranking of query candidates by information measure in formulation provides adaptive behavior • Incremental query formulation is effective for complex queries • GUI and voice interface can be built for query formulation from high-level concepts

More Related