410 likes | 541 Views
AN Introduction to XML. Zhu Maosheng 2001-03-15. Main Content. 1.XML Tutorial 2.XML As a Data Representation Standard and Data Model 3.XML As a Data Interchange Standard and Information Integration 4.Repository and XML Application Server. XML Tutorial. Background
E N D
AN Introduction to XML Zhu Maosheng 2001-03-15
Main Content 1.XML Tutorial 2.XML As a Data Representation Standard and Data Model 3.XML As a Data Interchange Standard and Information Integration 4.Repository and XML Application Server
XML Tutorial Background extend the HTML(MathML, CML, VoiceXML) data interchange(product catalog, health record…) Main Characters Data semantic Data independence Semi-structured(schemaless, irregular) Derived&others:flexible, local computing, data integration, structured text, License-free…
XML’S Goals • Enable internationalized media-independent electronic publishing. • Allow industries to define platform-independent protocols for the exchange of data, especially the data of electronic commerce. • Deliver information to user agents in a form that allows automatic processing after receipt. • Make it easy for people to process data using inexpensive software. • Allow people to display information the way they want it. • Provide metadata – data about information –that will help people find information and help information producers and consumers find each other.
Introduction to XML’S Family • Status of Document five phase: note->work draft -> candidate recommendation -> proposed recommendation -> recommendation. • whether a software support? Version? • XML 1.0 Recommendation • DTD&XML Schema Candidate Recommendation • Namespace, XPath 1.0 Recommendation, Xpointer, Xlink. • XSLT 1.0 Recommendation
XML 1.0 Recommendation 4.1 Basic Logic Structure document::= prologelementMisc* prolog::= XMLDecl? Misc* (doctypedeclMisc*)? XMLDecl::= '<?xml' VersionInfoEncodingDecl? SDDecl? S? '?>‘ VersionInfo::= S 'version' Eq (' VersionNum ' |" VersionNum ") Eq::= S? '=' S? VersionNum::= ([a-zA-Z0-9_.:] |'-')+ Misc::= Comment |PI |S 4.2 Basic Physical Structure entities;internal, external, general, parameter 4.3 Reading note(EBNF) 4.4 Writing a well-formed XML Document note
XML 1.0 Recommendation(continued) EBNF(Extended BNF) #Xnnnn, [a – z A - Z],[#Xnnnn - #Xnnnn] [^a – z A – Z], “string”, ab, a|b, a – b, a?, a+, a* One Example: Comment::= '<!--' ((Char - '-') |('-' (Char - '-')))* '-->' Writing XML note(eight points:< < & &) <?xmlversion="1.0"encoding="UTF-8"?> <!DOCTYPEgreeting[ <!ELEMENTgreeting(#PCDATA)> ]> <greeting time=“morning”>Hello,world!</greeting>
XML Example <?xml version=“1.0” standalone=“yes”> <BIB><BOOK nickname=“Dragon book”> <AUTHOR id=“aho”> Aho, A. V. </AUTHOR> <AUTHOR id=“sethi”> Sethi, R. </AUTHOR> <AUTHOR id=“ullman”> Ullman, J. D. </AUTHOR> <TITLE> Compilers: Principles, Techniques, and Tools </TITLE> <PUBLISHER> Addison-Wesley </PUBLISHER> <YEAR> 1985 </YEAR> </BOOK><BOOK> <AUTHOR idref=“ullman”/> <TITLE> Principles of Database and Knowledge-Base Systems, Vol. 1 </TITLE></BOOK> ... </BIB>
DTD&XML Schema Candidate Recommendation <!DOCTYPE bib [ <!ELEMENT BIB (BOOK+)> <!ELEMENT BOOK (AUTHOR+, TITLE, PUBLISHER?, YEAR?)> <!ATTLIST BOOK isbn CDATA #IMPLIED nickname CDATA #IMPLIED> <!ELEMENT AUTHOR (#PCDATA)> <!ATTLIST AUTHOR id ID #IMPLIED idref IDREF #IMPLIED> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT PUBLISHER (#PCDATA)> <!ELEMENT YEAR (#PCDATA)> ]> Its Drawback:non-xml, datatypes, namespace(<!ELEMENT mybib:BIB...>)
schema <xsd:schema xmlns:xsd=“http://www.w3.org/1999/XMLSchema”> <xsd:element name=“BOOK” type=“BOOKTYPE”/> <xsd:complexType name=“BOOK_TYPE” > <xsd:element name=“AUTHOR” type=“xsd:string” minOccurs=“1” maxOccurs=“unbounded”/> <xsd:element name=“TITLE” type=“xsd:string”/> <xsd:element name=“PUBLISHER” type=“xsd:string” minOccurs=“0” maxOccurs=“1”/> <xsd:element name=“YEAR” type=“xsd:decimal” minOccurs=“0” maxOccurs=“1”/> <xsd:attribute name=“isbn” use=“optional” type=“xsd:string”/> <xsd:attribute name=“nickname” use=“optional” type=“xsd:string”/> </xsd:complexType> </xsd:schema>
Other Schema Languages XDR(first XML-Data XML-Data Reduced <-DCD, MS) SOX(Schema for O-O XML <-DTD, Commerce One) DSD(AT&T) DCD(Document Content Description) DDML(Doc Definition Markup Language) Different Facet: Syntax in XML, namespace, include, import, Datatype, Attribute, Element, Inheritance, Being unique XML Schema is complete and complex(candidate)
Namespace, XPath 1.0 Recommendation, Xpointer, Xlink. Namespace: why need? Avoid name clash. declare, <BIB xmlns:mybib=“http://www.myserver.net/”> scope, default, identifier; XPath: location path is composed of location steps Location step contain axis, node test, predicate child::AUTHOR[position()<3]/attribute::id Abbreviation @,//,/,.,.. .//para=self::node()/descendant-of-self::node()/child::para
Namespace, XPath 1.0 Recommendation, Xpointer, Xlink Xpointer: extend XPath at scope location, string match, uri(urn,url) <a xml:link=“simple” href=“#xpointer(id(“foo”))/> Xpointer(id(“sec2.1”)/descendant::P[last()] to id(“sec2.2”)/descendant::P[last()]) Xlink: type:simple,locator,arc,extended,group; <AUTHOR xmlns:xlink=http://www.w3.org/1999/xlink xlink:type=“simple”xlink:href=“http://www-cs-faculty.stanford.edu/ knuth/” xlink:role=“don_~knuth_homepage” xlink:show=“embed” xlink:actuate=“onLoad”> Donald Knuth </AUTHOR>
XSLT 1.0 Recommendation A XSL file is a well-formed xml file contain a few templates A template is composed of pattern(xpath) and directive; <xsl:stylesheet version=“1.0” xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”> <xsl:template match=“/”> <HTML><xsl:apply-templates/></HTML> </xsl:template> <xsl:template match=“BIB”> <UL><xsl:apply-templates/></UL> </xsl:template> <xsl:template match=“BOOK”> <LI><xsl:apply-templates/></LI> </xsl:template> <xsl:template match=“AUTHOR”> <xsl:value-of select=“.”/> </xsl:template> <xsl:template match=“TITLE”> <EM><xsl:value-of select=“.”/></EM> </xsl:template> </xsl:stylesheet>
XML As a Data Representation Standard and Data Model • Why we need a Model of XML? (design, programming&implementation) • Concept Model(Model Tools) • Three facet(data structure, operator, constraint) • Architecture(data model, operation algebra, syntax) • Data Model network, hierarchy, relational, object-oriented, OEM, XML(advantage&disadvantage) • Distinct between them (navigational, structure) • ER -> Relation -> relation algebra -> SQL • UML -> Object-Oriented ->Object algebra -> OQL • ERX -> XML -> XML algebra -> XQL
Relational Data Model • Data structure Relation, Key • Operation theta select, project, theta join, divide, union, Intersection, set difference, extended to bag • Constraint entity integrity, referential integrity, user-defined • Relation Algebra& Relation Calculus • SQL
SQL • SELECT <attribute list> • FROM <relation list> • WHERE <condition> • GROUP BY <attribute list> • HAVING <condition> • ORDER BY <attribute list>
Data Structure Class, Object • Operation Class(method, property, inheritance..):definition, create, access, modification, destroy Object(property..):create, access, update, delete,query; • Constraint unique(OID, attribute name, method name) existence(method implementation…)
XML Data Model • It is commonly considered as a edge-labeled/node-labeled directed graph. • Node-labeled directed graph
XML Data Model(XML InfoSet) • Data structure Document, Elements, Attributes, Namespaces, Processing Instructions, Comments, Values. • Operation(functional notation) general:constructor&accessor. each kind of node has its own operation. document has uri :DocNode -> URIRefValue children:DocNode ->[Ref(ElemNode)|Ref(PINode) | Ref(CommentNode)] attribute has name :Attrnode -> Ref(QNameValue) value :AttrNode ->Ref(ValueNode) • Constraint ID, IDREF,IDREFS;
Example: • <?xml version=1.0?> <p:part xmlns:p="http://www.mywebsite.com/PartSchema" xsi:schemaLocation="http://www.mywebsite.com/PartSchema http://www.mywebsite.com/PartSchema" name="nutbolt"> <mfg>Acme</mfg> <price>10.50</price> </p:part>
children(D1) = [ E1 ] root(D1) = E1 • name(E1) = QNameValue("http://www.mywebsite.com/PartSchema", "part", Ref(Def_QName)) • children(E1) = [ E2, E3 ] attributes(E1) = { A1 } • namespaces(E1) = { N1 } type(E1) = Ref(Def_part_type) • parent(E1) = D1 name(A1) = QNameValue(null, "name", Ref(Def_QName)) • value(A1) = StringValue("nutbolt", Ref(Def_string)) • parent(A1) = E1 • prefix(N1) = StringValue("p", Ref(Def_string)) • uri(N1) = URIRefValue("http://www.mywebsite.com/PartSchema", Ref(Def_uriReference)) • parent(N1) = E1 name(E2) = QNameValue(null, "mfg", Ref(Def_QName)) • children(E2) = [ StringValue("Acme", Ref(Def_string)) ] • attributes(E2) = {} namespaces(E2) = {} • type(E2) = Ref(Def_string) • parent(E2) = E1
MasterFundamentals • Master Fundamentals • Hierarchy parent/child ancestor/descendant • Sequence immediately precedes precedes • Position absolute relative ranges
Transformation among these model • Between XML and relation • Between XML and o-o • Between relation and o-o • Between hierarchy and relation • between network and relation
XML Query • Why need XML query(view, integration)? • Query Operation union, intersection, difference, join, project, selection, sort,aggregation(XML Query Algebra, language and use cases); • Nine features necessary for an XML Query Language 1.clean semantic(select expr from path expr where cond/for path-expr where cond return result-set). 2.path expression 3.return XML doc 4.query and return XML element&attribute
XML Query(continued) 5.type coercion(semi-structured) 6.handle unexpected data(not exact match) 7.query XML without Schema/DTD(wildcard) 8.return tree 9.preserver order • Five popular XML Query Language Lorel, XML-QL(AT&T), XML-GL(Politecnico di Milano, XSL, XQL • Who win--XQuery(combine together W3C) • View(maintenance, like search engine) • Update language(management) • Triggers -> active view(e-commerce:actor, view, rule, notify)
XML As a Data Interchange Standard and Information Integration • Because it is self-describing and flexible • Integration level same data model(relational data integration) different data model(relational, object-oriented, text, html) • Integration method federated databases, warehousing(combiner/extractor), mediation(mediator/wrapper); • Semantic Model • Example(OEM Plus Browse) • XSD(merge/separate, proximity search equal to index)
Warehousing warehouse combiner extractor extractor datasource1 datasource2
Mediator-Wrapper datasource1 Mediator Wrapper Wrapper datasource2
Repository and XML Application Server • Goal:Fast access • Storage Method File/DOM(drawback:1.parse/per browse query, 2.demand too much memory, 3.index 4.update) RDBMS/SQL OODBMS/OQL XML-Enable(SQL Server, Oracle8i/Servlet/XSQL) XML Native(SoftwareAG/Tamino-index is great) Why do Oracle/MS build a native XML Server by their mature relation DB Technology? • Flexible storage • Distributed XML Storage Systems
XML Server Manager • Goal:manageability,availability • GUI Manager • Schema Manager • Data Browse&Maintenance • Slice too big size file into small • Import&export • Backup • Recovery • Integrity Maintenance • And so on
Summary • Simple&easy programming interface • Efficient Storage(relation query optimization) • Manageability • availability
What do we learn from RDBMS/OODBMS Tech? • Storage manager • Buffer manager • Indexing • Transaction manager? • Concurrency? • Query optimization
Current development in XML • Application-to-Application / Object Serialisation • Conversion Tools • Database Systems • Document / Content Management Systems • DTD/Schema Editors/Tools • Publishing Systems • Search engines • Utilities/Tools/APIs • XLink/XPointer Tools • XML Browsers • XML Editors • XML Parsers/Processors • XPath utilities • XSL formatters • XSLT editors • XSLT engines • XSLT utilities
Reading in XML • General Reading The first stop is http://www.w3c.org/xml(up-to-date) The first book is xml bible(some old) • Research&Develop Reading, Tools DBLP Bibliography http://www.informatik.unitrier.de/~ley/db/index.html http://www.acm.org/sigmod/ http://www.xmlsoftware.com/ http://www.acm.org/sigmod/databaseSoftware/index.html http://www.w3.org/Status http://xml.apache.org • Industrial Reading Microsoft, Oracle, Sun, IBM
谢谢大家! 提问和讨论