480 likes | 551 Views
CXquery (Chamois Xquery) and its Applications. Hwan-Seung Yong ( 용 환승 , 龍 煥昇 ) Dept. of Computer Science and Engineering 梨花女子大學校 (Ewha Womans Univ.) Seoul, 大韓民國. Contents. Motivations of CXquery: Structure Agnostic Query Query Processing Issues Developing CXquery System Experience
E N D
CXquery (Chamois Xquery) and its Applications Hwan-Seung Yong (용 환승, 龍 煥昇) Dept. of Computer Science and Engineering 梨花女子大學校 (Ewha Womans Univ.) Seoul, 大韓民國 H.S.Yong, EWU.
Contents • Motivations of CXquery: Structure Agnostic Query • Query Processing Issues • Developing CXquery System Experience • CXquery to Xquery Conversion • CXquery to XML Stream Query Processing • Final Remarks H.S.Yong, EWU.
What is CXquery • CXquery: Chamois Xquery • Chamois • Project name for Component Based Knowledge Engineering System Framework in Ewha lead by Dr. Won Kim [IEEE 2002] • Chamois is an antelope name living Alps Mountain • This animal requires short steps to leap high • CXquery is same with Xquery except one • We don’t need Xpath composing conditions. • Only use element/attribute name H.S.Yong, EWU.
Background • In RDB • query is made using schema (relation name, attribute name) with Constant Values for condition checking • Schema is relatively simple structure • Easy to learn query and can be used by end user • In OO and ORDB • Schema have complex structure • So query composition and design is very hard task • Only professionals do • In XML, what happen? H.S.Yong, EWU.
XML and Query issues • Hard to compose like OO/OR case • Try to design SQL like XML query until now • Xquery, Xpath: W3C standard • Is it SQL like? • XML even allow data with no schema (DTD unknown) • How do we make query? • Natural language query for RDB • For easy of use • How about natural language query for XML? • Or how about semi-natural language query H.S.Yong, EWU.
Some aspects on XML • XML is a meta language for encoding domain information (book, movie, music, product, company, math, chemistry etc) • There need XML standard for each domain worldwide (MathML, CML, BioXML etc) • But not yet enough • This means data from equal domain can be encoded using different DTD • There can be many kind of movie DTDs, music DTDs. • Xquery have to follow DTDs, so same query can be expressed by different Xquery H.S.Yong, EWU.
DTD Design Choices of same data • Element representation • Attribute representation • Nested representation • Combinations of the above H.S.Yong, EWU.
Element representation: 1st DTD Type • Xquery depends on DTD structure ① elements representation H.S.Yong, EWU.
Attribute representation: 2nd DTD Type for $m in doc ()//movie where $m/*[@genre = “action”] and $m/*[@year = “1994”] and $m/*[actor = “Jean Reno”] return <title>$m/*/@title</title> ② attributes representation H.S.Yong, EWU.
Nested representation: Third DTD Type for $y in doc ()//year where $y[text() = “1994”] // genre[text() = “action”] // actor[text()= “Jean Reno”] return $y//title/text()</title> <year>1994 <country>America <genre>action <movie> <title>Leon</title> <director>Luc Besson</director> <actor>Jean Reno</actor> </movie> … <genre> </country> … </year> ③ nested representation H.S.Yong, EWU.
Combinations: Fourth DTD Type for $g in doc ()//genre where $g[@* = “action”] // year[*= “1994”] // actor[text() = “Jean Reno”] return <title>$g//title/text()</title> <genre type=”action”> <country name=”America”> <year><yyyy>1994</yyyy> <movie> <title>Leon</title> <people director=”Luc Besson” actor=”Jean Reno”> </people> </movie> … </year> </country> </genre> ④ nested + attributes + elements representation H.S.Yong, EWU.
Independence and DBMS • But in XML in heterogeneous(?) distributed environment • Each Xquery seriously depends on its DTD • Without defining single DTD and XML data conversion, we have to make different Xquery H.S.Yong, EWU.
Real SQL like XML query • Rather use XPath expressions • /continent/country/state/city/name = ‘Kyoto’ • //city/name=‘Kyoto’ • //city//* = ‘Kyoto’ • //city/@name=‘Kyoto’ • Just use • name=‘Kyoto’ • Just use element or attribute name instead of Xpath • “Find information about city named Kyoto” • Natural query requires heav semantic processing H.S.Yong, EWU.
CXquery Approach • Assumption • User have to know exact tag name (Element/Attribute) and values • User didn’t know the structure (DTD) of XML • Query Example • Search for movie titles whose genre is ‘action’, release year is ‘1994’, and whose stars include ‘Jean Reno’ • genre = “action” and year = “1994” and actor = “Jean Reno” for $t in doc()//title where genre = “action” and year = “1994” and actor = “Jean Reno” return $t Apply to XQuery H.S.Yong, EWU.
Contents • Motivations of CXquery: Structure Agnostic Query • Query Processing Issues • Developing CXquery System Experience • CXquery to Xquery Conversion • CXquery to XML Stream Query Processing • Final Remarks H.S.Yong, EWU.
Four Query Processing Issues • First, ‘similarity matching’ is required • In an environment where the schema or DTD of XML documents is not precisely known or “fuzzy” (approximate) search is done, even the precise names of the elements and attributes may not be known. • Use thesaurus based matching • E.g) for the element names “actor”, “genre”, and “year”, the query processor may also need to search for names such as “performer”, “category”, and “date”, respectively. H.S.Yong, EWU.
Query Processing Issues • Second, heterogeneous representation of same content in XML • intervening elements and/or an attribute between an element name and its corresponding value. • Figure (a): One example DTD: element representation • Figure (b): type intervenes between genre and “action”, name intervenes between actor and “Jean Reno”, and yyyy intervenes between year and “1994”. • Figure (c), genre, year, and actor are represented as attributes • Figure (d), genre, year, and actor are represented as elements but their values are represented as attribute values. • This introduces significant implementation difficulties • Processor should consider all possible representations. H.S.Yong, EWU.
(a) (b) (c) (d) H.S.Yong, EWU.
Query Processing Issues • Third, intervening elements (<family>)and/or an attribute between an element and its corresponding value • leads to “semantic uncertainty” in the association between the element and the value. • “Jean Reno” is the value associated with the element or attribute “family” of “actor”. • blind binding of actor to “Jean Reno” is possble, declare that the search predicate “actor = Jean Reno” is true • Ex) <actor> • <family> • <name>Jean Reno</name> • … • </family> • </actor> Semantic correctness may be in question !!! H.S.Yong, EWU.
Query Processing Issues • Fourth, identification of nearest common ancestor (NCA) is needed • of all element and attribute names that appear in the search predicates • For query-processing optimization • For preventing erroneous results H.S.Yong, EWU.
Query Processing Issues • However, the problem is difficult • since the structure of the XML hierarchy is not specified in CXquery • Ex) NCA of year, genre, and actor H.S.Yong, EWU.
Contents • Motivations of CXquery: Structure Agnostic Query • Query Processing Issues • Developing CXquery System Experience • CXquery to Xquery Conversion • CXquery to XML Stream Query Processing • Final Remarks H.S.Yong, EWU.
Data names : element/attribute name Data values One Approach to support CXquery • Implementation Condition clause: genre =“action” AND year =“1993” AND actor =“Tommy Lee Jones” Structure?? Result clause: title H.S.Yong, EWU.
One Approach to support CXquery • Implementation of XML Server based on CXquery • Special Indexing is used • Node index: all element and attribute name • Value index: all constant value in XML • All node and value numbering to find their structural relationship • Indices are stored using RDB • Performance evaluation shows promising result. [ISMIS 2005] H.S.Yong, EWU.
All possible paths One Approach • Query processor should drive all paths among the names and values. • Identification of name and value relationship • Identification of relationship between names • Classification of all possible paths XML can have is investigated genre =“action” AND year = “1993” AND actor =“Tommy Lee Jones” H.S.Yong, EWU.
Path m-HEA Path d-FE-FA Path d-FA-FE Path m-FE Path m-FEA Path m-HE C1A C1E C1E C1E CnE iE C1A CnA iE V2 C1E iA CnE iA CmA CkE CmA V1 CkE C1A C1E V1 … CnE … … … V1 Vn Vn V2 Vn-1 Vn-1 Vn V1 V1 Vn Vn V1 Vn Path d-lHE-FE Path d-ulHE-FE Path d-uHE-FE Path d-uHE-HE Path d-lHE-HE Path d-FE-FEA Path d-FEA-FEA C1E C1E iE iE CnE C1E … iE iE V1 iE iE CnE iA iE CnA C1E iE C1A CnE … C1E CnE … … iE iE CnE C1E … V1 CnE iE V1 Vn iE iE V1 Vn V1 Vn Vn V1 Vn Vn Path d-ulHE-HE Vn V1 Path d-HE-HA Path d-HA-HE Path d-HE-HEA Path d-HEA-FE C1E Path d-HEA-HE Path d-HEA-HEA iE C1A iE C1A C1E C1E iA iE iE iE C1A iE CnA V1 iA iE V1 CnE V1 iE V1 iE CnE V1 CnE CnE iA V1 Vn C1E CnE iA Vn iA Vn CnE iA iE … Vn Vn V1 Vn Vn H.S.Yong, EWU.
One Approach • To search all possible paths, node numbering scheme is used for each node in XML <year yyyy=”1994”> <country name=“America”> <movie > <title>Leon</title> <genre type=“drama” type=”action”></genre> <people> <director>Luc Besson</director> <actor> <name>Jean Reno</name> <name>Natalie Portman</name> </actor> </people> </movie> …. <country name=”France”> …. </year> H.S.Yong, EWU.
10,1000 20,25 30,490 500,990 title 20,25 40,45 510,515 50,170 Leon 510,515 40,45 70,75 80,110 120,180 70,75 90,95 100,105 130,135 140,170 Doc-ID Start-Region End-Region name 90,95 150,155 160,165 130,135 100,105 1 10 220 movie 1 20 30 year 1 40 110 Basic-info 1 120 210 people 150,155 160,165 Node numbering to identify relationship H.S.Yong, EWU.
Processing flow diagram overview • Implement an XML-server to evaluate the performance of the query expression H.S.Yong, EWU.
Contents • Motivations of CXquery: Structure Agnostic Query • Query Processing Issues • Developing CXquery System Experience • CXquery to Xquery Conversion • CXquery to XML Stream Query Processing • Final Remarks H.S.Yong, EWU.
CXquery to Xquery Conversion • System Diagram Overview CXQuery to Xquery Converter CXQuery XML DB Result User DTD/Result Xquery DTD 1 XML XML Server (eXist 1.0) DTD 2 XML XML Document H.S.Yong, EWU.
CXquery to Xquery Converter • Set of Xquery should be generated for one CXquery based on number of different DTD For $c in doc() Where genre=”action” ANDactor=”Jean Reno” Return title CXQuery For $c in /movies Where $c/genre=”action” AND $c/actor=”Jean Reno” Return $c/title XQuery H.S.Yong, EWU.
Xquery for each DTD type H.S.Yong, EWU.
Contents • Motivations of CXquery: Structure Agnostic Query • Query Processing Issues • Developing CXquery System Experience • CXquery to Xquery Conversion • CXquery to XML Stream Query Processing • Final Remarks H.S.Yong, EWU.
System Flow diagram Input CXQuery DTD File XML Stream File CXQueries Processing DTD Path Generator CXQuery Converter Path Set XML Steam Xpath Queris Yfilter XML Stream Engine Output XML 문서 H.S.Yong, EWU.
CXquery to Xquery Conversion (b) CXQuery (a) DTD Path Set path_mondial-cities.txt CXQ1:is_country_cap="yes" or latitude CXQ2:car_code="MK and area="25333" CXQ3:name="Caspian Sea" or area="17000" CXQ4:latitude CXQ5:ethnicgroups CXQ6:name CXQ7:country="Korea" /cities/city/name /cities/city/latitude /cities/city/population /cities/city/located_at /cities/city[@is_country_cap] /cities/city[@is_state_cap] /cities/city/population[@year] /cities/city/located_at[@watertype] … (d) Xquery Generation (c) Xpath ser from (a) and (b) /cities/city/latitude /cities/city[@is_country_cap=“yes”] /cities/city[name=“Caspian Sea”] /cities/city/name /cities/city/name /cities/city/latitude /cities/city[@is_country_cap] H.S.Yong, EWU.
Xquery Conversion for each DTDs H.S.Yong, EWU.
Implementation Result • CXquery Example H.S.Yong, EWU.
Matching process of CXquery with Path Set H.S.Yong, EWU.
Xquery Conversion results H.S.Yong, EWU.
Example 6 CXquery H.S.Yong, EWU.
Converted 13 Xquery H.S.Yong, EWU.
CXquery for distributed XML servers • In heterogeneous DBMS environment • Single standard schema is required in central server • Query translation is required • Query on Standard schema translated into site’s schema • Distributed CXquery environment • We don’t need standard XML schema but collection of Each Site’s DTD is enough • User only compose query using CXquery • CXquery has DTD neutral property • Central site then convert CXquery to site’s Xquery and collect result. H.S.Yong, EWU.
Heterogeneous XML stream query processing • Stream data is increasing • RSS stream, news stream, stock trading, sensor stream, multimedia stream etc. • Stream processing engine is needed • Handle large number of heterogeneous XML stream concurrently • How do we use single stream query on this multiple heterogeous streams • Query translation for each stream and processing differently? • Apply single CXquery to multiple heterogeneous stream. H.S.Yong, EWU.
Final Remarks • CXquery having no path is introduced • This area of research need more works from now on • Technical issues for future research • Element/Attribute – Value Association is required to solve Semantic ambiguity problem • Name = “Kyoto” vs name=“Tanaka” vs name = “Winter Sonata” H.S.Yong, EWU.
Final Remarks • Possible approaches • Define DTD tag name more specifically • Cityname = “Kyoto” vs person-name=“Tanaka” vs Movie-name = “Winter Sonata” • System can resolve domain conflict exactly through data mining etc. • “Kyoto” represents city name, “Tanaka” represents Person name etc. • User specify exact domain name for all constants • Name = “Kyoto[City]”, name=“Tanaka[Person]” name=“Winter Sonata[Movie]” • XML extension is required H.S.Yong, EWU.
Thank you for your attention • 聞いて いただいて どうも ありがとう ございました • Questions? H.S.Yong, EWU.
References • ISMIS 2005] Wol Young Lee, Hwan Seung Yong, "A Query Expression and Processing Technique for an XML Search Engine," ISMIS 2005: 15th International Symposium on Methodologies for Intelligent Systems, Saratoga Springs, NY, USA, May 2005.pp.266-275. • [JOT 2004b]Won Kim, Wol Young Lee, Hwan Seung Yong, "On Query-Processing Issues for Non-Navigational Queries for XML," in Journal of Object Technology, Vol.3, No. 10, November-December 2004, pp. 19-26. • [JOT 2004a] Won Kim, Wol Young Lee, Hwan Seung Yong, On Supporting Structure-Agnostic Queries for XML, in Journal of Object Technology, Vol.3, No.7, July-August 2004, pp.27-35 , • [JOT 2002] Won Kim et al., "The Chamois Reconfigurable Data-Mining Architecture, " Journal of Object Technology, Vol. 1, No. 2, July-August 2002, pp.2-10. • [IEEE 2002] Won Kim et al., "Chamois: A Component-Based Knowledge Engineering Framework," IEEE Computer, Vol. 35, No. 5, May 2002, pp. 46-54. H.S.Yong, EWU.