590 likes | 694 Views
A Logic Programming Approach to Supporting the Entries of XML Documents in an Object Database. Ching-Long Yeh 葉 慶 隆 Department of Computer Science and Engineering Tatung University Taipei 104, Taiwan chingyeh@cse.ttu.edu.tw. Introduction. XML improves upon HTML in
E N D
A Logic Programming Approach to Supporting the Entries of XML Documents in an Object Database Ching-Long Yeh 葉 慶 隆 Department of Computer Science and Engineering Tatung University Taipei 104, Taiwan chingyeh@cse.ttu.edu.tw XML Object Database
Introduction • XML improves upon HTML in • capturing the meaning of a document and • extending the tag set. • At the same time, it also reduces the complexity of SGML. • It is believed that XML will soon be the standard of data exchanges on the Web. XML Object Database
Introduction • Due to lack of indices in files, we are not able to make full use of the meaning (or metadata) in an XML document, if it is stored in a file. • Since an XML document can be easily viewed according to the object-oriented model, a promising solution is to employ object database technology to manage the access of XML documents. XML Object Database
Introduction • In this talk, I will present our research in • the design and implementation of an XML object DB, and • an extensible template-based query interface to accessing to XML object database XML Object Database
The Remainder of the Talk • An Introduction to XML • Design and Implementation of an XML Object Database • An Extensible Template-based Interface XML Object Database
An Introduction to XML XML Object Database
HyperText Markup Language • HTML is a language used to create hyperlink text in the WWW. • The text is presented according to a set of predefined tags. • The definition of tags is based on the Document Type Definition (DTD) of SGML. • In other words, HTML is an application of SGML in the WWW. XML Object Database
Standard Generalized Markup Language • Central to SGML is the concept that documents have structure, content, and format. • These three ingredients combine to form a document. XML Object Database
What is Content? • Content is the actual data within a document. • The words and illustrations that make up a bicycle assembly manual are its contents. XML Object Database
What is Format? • Format consists of how the words, sentences, and paragraphs are visually presented and distinguished from one another within a document. • Boldface for title, italics for special terms, and blank lines between sections are examples of document formats. • People often confuse format with structure. XML Object Database
12 ounces coconut milk 4 to 6 tablespoons sugar 4 to 6 tablespoons cornstarch 3/4 cup water Pour coconut milk into saucepan. Combine sugar and cornstarch; stir in water and blend well. Stir sugar mixture into coconut milk; cook and stir over low heat until thickened. What is Structure? Recipe Title Coconut Pudding Ingredient List Ingredient Instruction List Step XML Object Database
Document Type Definition • Defining the structures in XML/SGML • The structure of a document its type is defined by a document type definition, or DTD. • The DTD lays out the rules for a document through the use of elements, attributes, and entities. XML Object Database
Document Type Definition • A DTD looks like <!ELEMENT recipe -- ( title, ingredientList, instructionList)> <!ELEMENT title -- (#PCDATA)> <!ELEMENT ingredientList -- (ingredient*)> <!ELEMENT instructionList -- (step*)> <!ELEMENT ingredient -- (#PCDATA) > <!ELEMENT step -- (#PCDATA)> XML Object Database
Document Instance <!DOCTYPE RECIPE PUBLIC ”recipe" ”recipe"> <RECIPE><TITLE>Coconut Pudding</TITLE> <INGREDIENTLIST> <INGREDIENT> 12 ounces coconut milk</INGREDIENT> <INGREDIENT> 4 to 6 tablespoons sugar </INGREDIENT> <INGREDIENT> 4 to 6 tablespoons cornstarch </INGREDIENT> <INGREDIENT> 3/4 cup water </INGREDIENT> <INGREDIENTLIST> <INSTRUCTIONLIST>` <STEP> Pour coconut milk into saucepan. </STEP> <STEP>Combine sugar and cornstarch; stir in water and blend well. </STEP> <STEP>Stir sugar mixture into coconut milk; cook and stir over low heat until thickened. </STEP> … </INSTRUCTIONLIST> </RECIPE> XML Object Database
HTML, SGML, XML • HTML helped establish the Internet by providing a universal way to present information. • However, HTML only addresses the presentation of data. • Using SGML, user can add structure along with the content of a document. • However, SGML has proven too heavy-weight for the Internet. XML Object Database
Extensible Markup Language • The XML is a simple dialect of SGML. • HTML is sufficient for sending web pages that are viewed by human beings. • XML, however, adds the tags that enable computers to understand, act on or process the information. • XML has been designed for ease of implementation and for interoperability with both SGML and HTML. XML Object Database
XML Application Profile • Electronic commerce • Electronic data interchange (EDI) • Fine-grain content publishing • Internet search engines • Distributed application design • etc. XML Object Database
Data Type Requirements of Documents • HTML • One file per page • Simple uni-directional linking • XML • Tens, hundreds or even thousands of objects per page • Multiple DTDs • Hierarchical structure and rich linking • Query and navigation capabilities required • Agents and business rules interact with the data XML Object Database
Data Types of Storage • File system • Store monolithic stuff. • Folder system on top of them • Good at storing multimedia data XML Object Database
Data Types of Storage • Relational database • Tabular in nature • Good at storing rows and columns of data like spreadsheets and data from forms like invoices. XML Object Database
Data Types of Storage • Object-oriented database • Good at managing structured, hierarchical rich linked information. • That’s exactly what XML is. • XML is the object representation of data. XML Object Database
Design and Implementation of an XML Object Database XML Object Database
Basic Idea • The arrangement of elements in an XML document is governed by the element and attribute list declarations in document type definition. • The creation of DTD in a sense is closely related to defining new data types and hierarchical relationship in an object database. • Thus, to enter an XML document into an object database, at first a new schema corresponding to a DTD is generated in the object database, and then the document conforming to that DTD is fragmented into objects and entered into the database. XML Object Database
Basic Idea • Both the tasks of creating a schema in object database for a DTD and fragmenting XML documents into objects can be divided into two parts: analysis and generation. • For the former task, an input DTD is analyzed according to the formation rules specified in the XML recommendation, and the schema definitions are produced for the structures found in the analysis of DTD. • The other task is to analyze XML document instances and produce object definitions for the elements found in them. XML Object Database
Basic Idea • We employ the definite clause grammar (DCG) in Prolog as a tool to implement the analysis and generation tasks. • The basic idea is to encode the analysis task in the context-free rule part and the generation task in the action part of the DCG rules. XML Object Database
Strucuture Document Database • Combine structured document with OODB technology: • VERSO project at INRIA • News-On-Demand Application • Document Database from GMD-IPSI • XML document database products: • The Poet XML Repository • eXcelon, ODI • Ardent Sofiware, Inc XML Object Database
System Architecture XML Object Database
elementdecl::= ’<!ELEMENT S Name S contentspec S? ‘>’ elementdecl(contentModel(N,C))--> elementPrefix, name(N), contentSpec(C), rightAngle. contentspec::= ‘EMPTY’| ‘ANY’| Mixed | children contentSpec(C)--> empty,{C=’EMPTY’}; any,{C={ANY’}; mixed(C); children(C). DTD Parser XML Object Database
<!ELEMENT top (p,spec,div1)> <!ELEMENT p (#PCDATA|a|ul|b|i|em)*> <!ELEMENT spec (front,body, back?)*> <!ELEMENT div1 (head,(p|list1 |note)*, div2*)> <!ELEMENT name (#PCDATA)> <!ELEMENT a (#PCDATA)> <!ELEMENT ul (#PCDATA)> <!ELEMENT b (#PCDATA)> <!ELEMENT i (#PCDATA)> <!ELEMENT em (#PCDATA)> <!ELEMENT front (#PCDATA)> <!ELEMENT body (#PCDATA)> <!ELEMENT back (#PCDATA)> <!ELEMENT head (#PCDATA)> <!ELEMENT list1 (#PCDATA)> <!ELEMENT note (#PCDATA)> <!ELEMENT div2 (#PCDATA)> [contentModel(top,seq([p/null,spec /null,div1/null])/null), contentModel(p,mixed([pcdata,a,ul,b,i,em])), contentModel(spec,seq([front/null,body/null,back/question])/star), contentModel(div1,seq([head/null,alt([p/null,list1/null,note/null]) /star,div2/star])/null), contentModel(name,pcdata), contentModel(a,pcdata), contentModel(ul,pcdata), contentModel(b,pcdata), contentModel(i,pcdata), contentModel(em,pcdata), contentModel(front,pcdata), contentModel(body,pcdata), contentModel(back,pcdata), contentModel(head,pcdata), contentModel(list1,pcdata), contentModel(note,pcdata), contentModel(div2,pcdata)] Parsing Result XML Object Database
defineClass 'Top' super: SingleSeq { instance: 'P' 'p'; 'Spec' 'spec'; 'Div1' 'div1';}; defineClass 'P' super: Mixed { instance: List<Mixedp> mixedp;}; defineClass Mixedp super: SingleAlt { instance: String pcdata; 'A' 'a'; 'Ul' 'ul'; 'B' 'b'; 'I' 'i'; 'Em' 'em';}; defineClass 'Spec' super: MultiSeq { instance: List<Seqspec> seqspec;}; defineClass 'Seqspec' super: SingleSeq { instance: 'Front' 'front'; 'Body' 'body'; 'Back' 'back';}; defineClass 'Div1' super: SingleSeq { instance: 'Head' 'head'; List<Alt1> 'alt1'; List<Div2> 'div2';}; defineClass 'Alt1' super: SingleAlt { instance: 'P' 'p'; 'List1' 'list1'; 'Note' 'note';}; defineClass 'Name' super: Unstructured { instance: String pcdata;}; ... Schema Generation XML Object Database
top(V) --> stg(top), p(P),spec(Spec),div1(Div1), etg(top). p(V) --> stg(p), mixedp(Mixedp),etg(p). mixedp(V) --> (pcdata(Pcdata); a(A);ul(Ul);b(B); I(I); em(Em);{false}), mixedp(_); []. spec(V) --> stg(spec), spec1(Spec), etg(spec). spec1(V) --> front(Front), body(Body), (back(Back);[ ]), spec1(_); []. div1(V) --> stg(div1), head(Head), alt1(Alt1), div21(Div21), etg(div1). alt1(V) --> (p(P); list1(List1); note(Note); {false}), alt1(_); []. div21(V) --> div2(Div2), div21(_) ; []. name(V) --> stg(name), pcdata(Pcdata), etg(name). DI Parser XML Object Database
Rule_Head --> Start_Tag, Rule_Body, End_Tag, {Semantic Actions}. DI Parser Generation for each contentModel(ElementName,ContentStructure) do generate the rule head for ElementName; generate the start tag for ElementName; generate the rule body for ContentStructure; generate the end tag for ElementName; generate the semantic action; XML Object Database
Implementation • We have built a prototype of the system using LPA Win-Prolog V3.5 on personal computer. • It consists of a DTD parser, Schema generator and DI parser generator. • After creating the physical store and class family for XML documents, we can proceed to build the database schema for DTD by executing the ODQL codes generated by the DTD schema generator. XML Object Database
An Extensible Query-By-Template Interface to Accessing XML Document Database XML Object Database
Motivation • Vastness of search results on current WWW search engines • Textual-based query language with a simple English-like syntax is inconvenient for the user. • Current user interfaces primarily use form-based queries. XML Object Database
Goal • The goal is to design a convenient interface for user to access XML document without knowing the knowledge of the document types. • The interface will relieve user from typing complex query language. • The interface should be web-based and platform-independent. XML Object Database
System Architecture Visual Query Interface XML Object Database
Visual Query Facility • Query By Example (QBE) • The interface is composed of tabular skeletons representing tables in the database. • Query By Forms (QBF) • The interface is presented with a list of searchable fields, each with an entry area that can be used to indicate the search string. • Query By Template (QBT) • The interface is displayed a template for a representative entry of the database. User express their queries by indicating the search keywords in the appropriate regions of the template. XML Object Database
Example of Image-based QBT XML Object Database
Limits of Image-based QBT • The image template is divided into regions, each of which corresponds to an element in the document structure. • Associated with each regions is the query action. • Its significant drawback is the lack of flexibility in the template creation. • It is difficult to automate the task of reconfiguration of query action associate with the new template. • A single interface template for all types of document is probably not a good idea. XML Object Database
Concept of eXtensible QBT (XQBT) • The environment provides a template creator which consists of a DTD schema browser and a scene for presentation design. • The environment aims at providing automatic configuration of query actions associated with presentation of template. • The design of the template presentation must be tightly coupled with the arrangement of document data stored in the repository. • The component in the design of presentation must be properly associated with corresponding nodes in the object database schema. XML Object Database
Environment for XQBT XML Object Database
Template Creator • The template creator consists of a DTD schema browser a scene for template draft, and functional area. • The template creator in mainly relied on a DTD schema browser, which corresponds to the database schema. • The scene is a visual display area where the designer can organize a template draft for certain purpose. • The content of template draft is exported to a file, which contains the template presentation and additional information. XML Object Database
Functional area Template Creator Functional Area XML Object Database
Exported File • The file contains the information about the template presentation property associate with each element. • Each element is appended with the path information in the database schema, in order that the template executor, which can make use of the information to carry out query actions. XML Object Database
Template Executor • The template executor loads the exported file and presents the template as was originally designed in the template creator. • The path of each node in the DTD schema browser is used to carry out the query action required by the user. XML Object Database
The template is an image by taking a photograph or by scanning from existing pages. The query action associate with each region is hand-coded. Either planar or nested template is limited to region level that is not very deep. The template is generated for a representative document. The associated query action can be generated automatically for the interface program. The designer can change the template to meet the requirement of various region level. Comparison between Image-based QBT and XQBT XQBT QBT XML Object Database
Implementation • Java Proxies (Jp) for Jasmine • Jp allows developer to build their application in J-API, and take advantage of Jasmine class libraries. XML Object Database