280 likes | 375 Views
Query Formation From High-Level Concepts for Relational Databases. Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University of California Los Angeles, CA http://www.cobase.cs.ucla.edu. Outlines. Overview Semantic Graph Model
E N D
Query Formation From High-LevelConcepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University of California Los Angeles, CA http://www.cobase.cs.ucla.edu
Outlines • Overview • Semantic Graph Model • High-Level Query Formation for SPJ queries • Incremental Query Formation for Complex Queries • Conclusions
Overview: Query Formation • Based on semantic graph model, including user-defined relationships • User specifies requests and constraints • Formulate simple query by graph search technique • Candidates ranked by information measure • English-like query description • A complex query can be formulated by a series of simple queries
Related Work • Query formulation as Steiner tree problem (Wald and Sorenson, 1984) • limited to partial 2-tree graphs • Formulate simple Select-Project-Join (SPJ) queries via Universal Relation Model: no need to specify natural joins (Ullman 1988, Vardi, 1988) • Object-oriented query path expression completion: partial order relationship between different path for ranking (Ioannidis and Lashkari, 1994) • Query-by-Icon (QBI) [Massari and Chrysanthis, 1995] • Natural language interfaces (text/voice): logical form to query
Semantic Graph Model • Weighted graph G=(V,E): • Nodes: entities -- strong, weak, user-defined • Links: relationships -- ISA, HAS, simple, complex, user-defined • For relational databases: • nodes: relations • links: natural and user-defined joins • Weight: information measure of a node or link
Query Feature • Query expression in a semantic graph • Query Topic, T: A set of Joins represented by links • Query Constraints, C: Query Conditions • Query Aspect, A: Attribute list
A query topic for “aircraft can land on airports at geographical locations of countries” can land airfield_chars runways have located is a geoloc country airports
Semi-Automatic Generation of Semantic Model • Find natural joins through key and foreign key between nodes. • User-defined links can be added into the graph model. • Designers need to specify link types and assign names to all the elements in the graph.
Example of SemanticModel Generation AIRPORT: APORT_NM, GEOLOC_TYPE, GLC_CD, ELEV_FT, …; key: APORT_NM. RUNWAY: APORT_NM, RUNWAY_NM, GLC_CD, RUNWAY_LENGTH_FT, RUNWAY_WIDTH_FT, …; key: RUNWAY_NM. GEOLOC: GLC_CD, GLC_NM, CY_CD, LATITUDE, LONGITUDE, …; key: GLC_CD. COUNTRY: CY_CD, CY_NM, …; key: CY_CD. Links: AIRPORT--RUNWAY: APORT_NM; AIRPORT--GEOLOC: GLC_CD; RUNWAY--GEOLOC: GLC_CD; GEOLOC--COUNTRY: CY_CD;
Information Measure Information measure of a node or link, a I(a) = - log P(a) where P(a) is the probability of a being used in queries. Assume nodes and links are independent, for a subgraph with a set of elements A={ai | i = 1, …, n}, information measure is additive: n I(A) = SUM I(ai) i = 1
Information Measure(cont.) Initial Information Measure: all the nodes = 1 different nodes have a different value Information measure is normalized and converted into counts Probability of a node or a link is P(ai) = ci/c • Update Information measure • Ranking based on Information measure, thus adapt to user feedback
Query Formulation To formulate (simple) queries without knowledge of query language or database schema Example: Find airports in Tunisia that can land a C-5 cargo plane User input: Query aspect: AIRPORTS.APORT_NM Constraints: AIRCRAFT_AIRFIELD_CHARS.AC_TYPE_NAME = ‘C-5’ COUNTRY_STATE.CY_NM = ‘Tunisia’ Links: CAN LAND
Formulated Query SELECT R3.APORT_NM FROM AIRCRAFT_AIRFIELD_CHARS R0 AIRPORTS R3, COUNTRY_STATE R11 GEOLOC R12, RUNWAYS R16 WHERE R0.AC_TYPE_NM = ‘C-5’ AND R11.CY_NM = ‘Tunisia’ AND R0.WT_MIN_AVG_LAND_DIST_FT <= R16.RUNWAY_LENGTH-FT AND R0.WT_MIN_RUNWAY_WIDTH_FT <= R16.RUNWAY_WIDTH_FT AND R11.GLC_CD = R3. GLC_CD AND R3.APORT_NM = R16.APORT_NM AND R11.CY_CD = R11.CY_CD
Query Completion as GraphSearch Problem Given: An incomplete input query topic Ti Find a set of links to complete the topic (to make Ti connected) Minimum Missing Information principle: The query completion candidate Tc (the missing links and nodes) for an incomplete input topic Ti contains the minimum information
Query Formulation Algorithm • Input: subgraph T of the semantic graph G • Find candidates with the minimum Information measure • Two methods used to limit the search scope: • L-step-bound paths: paths that connect two components with at most L links, to limit search within the neighborhood of the input subgraph • k-minimum completion candidates: only at most k candidates with minimum Information measure are kept (alpha-beta pruning)
Initial Components and 2-Step-BoundPaths For the “CAN LAND” Query repair (1) airfield_chars airports 2 can land have authorize airfield_chars runways (2) airfield_chars aircrafts airports 1 2 have runways airports (3) 1 airports at is a (4) airports runways geoloc 1 1 at located country (5) runways country geoloc 1 1 is a located (6) airports country geoloc 1 1 (a) Initial components (b) 2-step-bound paths
The Semantic Graph For theTransportation Domain at can land airfield_chars runways 1 2 1 have located is a airports geoloc country 1 1 Relation Node weather
Incremental Query Formulation • Incremental Query Formulation • To assist user reach a complex query goal with a series of simple queries • The subsequent queries may depend on results of preceding queries (derived relations) • Issues • Incorporate derived relations into the semantic graph • Suggest missing attributes to link isolated derived nodes to the graph
Incremental Query Examples • Find airports in Tunisia. • Which of these airports can land a C-5? • What is the weather at these airports?
Incorporating Derived Relations • Source relation: contributes attributes to the derived relations • Derived relation: inherits properties of attributes from their source relations • Deriving link: links to the source relations through inherited keys • Inherited link: inherits links from the source relations
airporttunisiacanland airporttunisiacanlandweather Extended semantic graph showing derived nodes, derived links and inherited links at can land airfield_chars runways 1 2 1 have located is a airports geoloc country 1 1 airporttunisia weather Relation Node Derived Node Derived Link Inherited Link
Suggesting KeyAttributes for a Query • Find source relations for the isolated derived relation. • Suggest key of the source relations as attributes to include.
Conclusions • Semantic graph model provides a basis for query formulation search • Ranking of query candidates by information measure in formulation provides adaptive behavior • Incremental query formulation is effective for complex queries • GUI and voice interface can be built for query formulation from high-level concepts