180 likes | 301 Views
Intelligent Querying of Web Documents Using a Deductive XML Repository. Nick Bassiliades, Ioannis Vlahavas Dept. of Informatics Aristotle University of Thessaloniki. Abstract. X-DEVICE is a deductive OODB system It is used for storing XML documents as objects
E N D
Intelligent Querying of Web Documents Using a Deductive XML Repository Nick Bassiliades, Ioannis Vlahavas Dept. of Informatics Aristotle University of Thessaloniki
Abstract • X-DEVICE is a deductive OODB system • It is used for storing XML documents as objects • X-DEVICE has a powerful rule-based query language for • intelligently querying stored XML documents • publishing the results • The rule language features: • second-order syntax • generalized path and ordering expressions • Metadata are used to translate the extended features into first-order rules
Object Model of XML Data • DTD definitions are automatically translated into a class schema • XML documents are automatically translated into objects • Generated classes and objects are stored within the underlying OODB ADAM • ADAM is an OODB built on Prolog (Norman Paton, Peter M.D. Gray, Univ. of Aberdeen)
company name ticker_symbol? description? business_code partners? competitors? partners partner+ competitors competitor+ Object Model of XML DataW3C XQuery: TEXT Use Case <!ELEMENT company (name, ticker_symbol?,description?, business_code, partners?, competitors?)> <!ELEMENT name (#PCDATA)> <!ELEMENT ticker_symbol (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT business_code (#PCDATA)> <!ELEMENT partners (partner+)> <!ELEMENT partner (#PCDATA)> <!ELEMENT competitors (competitor+)> <!ELEMENT competitor (#PCDATA)>
content content_alt1+ par … figure … content_alt1 par figure Object Model of XML DataAlternation <!ELEMENT content (par | figure)+ >
Deductive XML Query Language • The X-DEVICE language is an extension of DEVICE, the basic deductive rule language • N. Bassiliades, I. Vlahavas, A.K. Elmagarmid, E-DEVICE: An extensible active knowledge base system with multiple rule type support, IEEE TKDE, 12(5), 824-844, 2000. • X-DEVICE rules are pre-compiled into DEVICE deductive rules • Deductive rules are compiled into production rules • ECA rules with one complex event • Matching through RETE network
X-DEVICE LanguageBasic first-order deductive rules if C@company(name=‘XYZ Ltd’, partner.partnersP) then partner_of_xyz(partner:P) • Selects company C with name ‘XYZ Ltd’ • Iterates over partners P through navigation • Path inverse notation: NOTpartners.partner • Defines a new derived class of partners of company XYZ • Derived objects are materialized
X-DEVICE LanguageRecursion if P@partner_of_xyz(partner:P1)and C@company(name=P1, partner.partners P2) then partner_of_xyz(partner:P2) • Rule processing uses semi-naïve evaluation • Negation is allowed (safety, stratification) • Single-valued attributes use : for instantiation • Multi-valued attributes use for instantiation • Prolog lists guarantee correct ordering
X-DEVICE LanguageVariable-Attribute Expressions if C@company(A$ ‘XYZ’) then a_xyz_comp(company:list(C)) • We don’t know which attribute of company contains the string ‘XYZ’ • A is second-order variable (meta-variable) • list is an aggregation function (collects company OIDs in a multi-valued attribute) • The $ operator performs string search
X-DEVICE LanguageTranslation ofVariable-Attributes if company@xml_seq(elem_order A) then new_rule(‘ if C@company(A $ ‘XYZ’) then a_xyz_comp(company:list(C)) ’) => deductive_rule • Iterate over meta-class xml_seq to find all attributes (sub-elements) of class company • A production rule creates one deductive rule for each instantiation of A • A is now a first-order variable in the condition and a constant in the action
X-DEVICE LanguageGeneralized Path Expressions if C@company(* $ ‘XYZ’) then a_xyz_comp(company:list(C)) • The search for string ‘XYZ’ must be performed • not only to attributes of company • but also to attributes of objects contained within company • at all levels of nesting
company name ticker_symbol? description? business_code partners? competitors? partners partner+ competitors competitor+ X-DEVICE LanguageTranslation ofGeneralized Paths • Iterate over all immediate elements of class company • Store them into an auxiliary derived class if company@xml_seq(elem_order X1) then tmp_elem1(cnd_elem:X1, path:[X1])
company name ticker_symbol? description? business_code partners? competitors? partners partner+ competitors competitor+ X-DEVICE LanguageTranslation ofGeneralized Paths • Recursively iterate over all elements and sub-elements stored in the auxiliary class • The path-so-far from the root company element is accumulated if X1@tmp_elem1(cnd_elem:X2,path:X3) and X2@xml_seq(elem_order X4) then tmp_elem1(cnd_elem:X4, path:[X4|X3])
X-DEVICE LanguageTranslation ofGeneralized Paths • Terminate the recursion if no more nested elements can be found • Create one deductive rule for each “discovered” concrete path if X1@tmp_elem1(cnd_elem:X2,path:X3) and not X2@xml_seq and prolog{create_path(X3,PATH)} then new_rule(‘ if C@company(PATH $ ‘XYZ’) then a_xyz_comp(company:list(C)) ') => deductive_rule
X-DEVICE LanguageTranslation ofGeneralized Paths • The following deductive rules are created C@company(name $ ‘XYZ’) C@company(ticker_symbol $ ‘XYZ’) C@company(description $ ‘XYZ’) C@company(business_code $ ‘XYZ’) C@company(partner.partners $ ‘XYZ’) C@company(competitor.competitors $ ‘XYZ’) • Optimization of multiple rules is achieved through common parts of the RETE network • The DEVICE system takes care of that
Advantages of X-DEVICE • Logic-based query languages have • well-understood mathematical properties • declarative nature • advanced optimization techniques (magic-sets) • X-DEVICE compared to XQuery (functional) • more high-level, declarative syntax • more compact and comprehensible • general path expressions • due to fixpoint semantics and second-order variables
Advantages of X-DEVICE • Users can express complex XML document views • Information customization for e-commerce, e-learning, etc. • X-DEVICE offers multiple knowledge representation formalisms • Deductive, Production, and Active rules • Structured objects • Production and Active rules can be used to update XML documents • All the above can play an important role as an infrastructure for the Semantic Web
Intelligent Querying of Web Documents Using a Deductive XML Repository Nick Bassiliades, Ioannis Vlahavas Dept. of Informatics Aristotle University of Thessaloniki X-DEVICE site www.csd.auth.gr/~lpis/systems/ x-device.html