380 likes | 566 Views
Module 1 Introduction and Motivation. „If I invent another programming language, its name will contain the letter X.“. (N. Wirth, Software Pioniere Konferenz, Bonn 2001). Google Indicator. A history of „Language“. 2 x (Descartes). l x.2x (Church). (LAMBDA (x) (* 2 x)) (McCarthy).
E N D
„If I invent another programming language, its name will contain the letter X.“ (N. Wirth, Software Pioniere Konferenz, Bonn 2001)
W3C <?xml version=„1.0“> <lambda-term> <varlist> <var>x</var></varlist> <expression> <application> <expr><const>*</const></expr> <arg-list><expr><const>2</const></expr> <expr><var>x</var></expr> </arg-list> </application> </expression> </lambda-term>
What can the Web do for you? • Download + show HTML Documents • Forms • Pre-compiled point queries • Updates in specific Web application • Everywhere, any time, platform independent • Simple keyword search (Google) • Good for human-human, human-machine communication • Scalability in the Millions of Users
What the Web cannot do? • Applications do not understand HTML • Machine-Machine communication difficult • Distributed Updates • Long transactions (business processes) • Powerful Queries • Where can I find a used car for CHF 1000 • Scalability in the Millions of Machines
Design Principles of W3C • Everybody is autonomous • Everybody can participate (open) • All Standards are compatible • All Standards are downwards compatible • Platform- and vendor independance
A little bit of history • Database world • 1970 relational databases • 1990 nested relational model and object oriented databases • 1995 semi-structured databases • Documentsworld • 1974 SGML (Structured Generalized Markup Language) • 1990 HTML (Hypertext Markup Language) • 1992 URL (Universal Resource Locator) • Data + documents = information • 1996 XML(Extended Markup Language) • URI (Universal Resource Identifier)
What is XML? • The Extensible Markup Language (XML) is the universal format for structured documents and data on the Web. • Base specifications: • XML 1.0, W3C Recommendation Feb '98 • Namespaces, W3C Recommendation Jan '99
XML Data Example <bookyear=“1967”> <title>The politics of experience </title> <author> <firstname>Ronald</firstname> <lastname>Laing</lastname> </author> </book> Elements • Syntax, no abstract model • Documents, elements and attributes • Tree-based, nested, hierarchically organized structure
XML vs. relational data • Relational data • Killer application: banking industry • Invented as a mathematically clean abstract data model • Philosophy: schema first, then data • Never had a standard syntax for data • Strict rules for data normalization, flat tables • Order is irrelevant, textual data supported but not primary goal • XML • First killer application: publishing industry • Invented as a syntax for data, only later an abstract data model • Philosophy: data and schemas should not be correlated, data can exist with or without schema, or with multiple schemas • No data normalization, flexibility is a must, nesting is good • Order may be very important, textual data support a primary goal
Reasons for the XML success • XML is a generaldata representation format • XML is human readable • XML is machine readable • XML is internationalized (UNICODE) • XML is platform independent • XML is vendor independent • XML is endorsed by the World Wide Web Consortium • XML is not a new technology • XML is not only a data representation format, it’s a full infrastructure of technologies
XML as a family of technologies • XML Information Set • XML Schema • XML Query • The Extensible Stylesheet Transformation Language (XSLT) • XLink, XPointer • XML Forms • XML Protocol • XML Encryption • XML Signature • Others • … almost all the pieces needed for a good XML-based information hub
Overview of XML Technologies • W3C Standards • Data: XML, Namespaces, Infoset, Schema • Communication: SOAP, Encryption, WSDL, UDDI • Processing: Xpath, XSLT, Xquery, Xupdate, Xquery Text • Integration: RDF, OWL • Other Standards • Vertical domains: RosettaNet, ebXML, *ml • Workflow: BPEL • Interfaces: DOM, SAX, JAXP, SQL / XML
Motivation for XML • Data lives forever (longer than program code) • legacy systems: need to keep code to keep data • huge IT infrastructures • „hello world“ program is very complex • Model before Data (you need to know what you want) • poor „time to market“, high cost • SQL + Objects are not enough • middleware, data marshalling, … • No querying of objects, no encapsulation in SQL • teure (five star guru) programmers needed • XML: Decouple Data and Schema!!!
Killer XML advantages • Code/schema/data independence • Covers the continuous spectrum from totally structured data to documents • from data management to information management • Unique model for representing data,metadata and code
Data + metadata + code • Data (XML), schemas (XML Schemas) and code (XSLT, XQuery): they all have an XML syntax • Easy to mix and match: • Data in the schemas (not yet) • Data in code (already done) • Code in schemas (not yet) • Code in the data (not yet) : dynamic data
Misunderstanding about XML • “Data is self-describing.” • Tags don’t hold semantics, they only hold the structure of the information • The interpretation of the tags is in the application that handles the data, not in the tags themselves.
XML handicaps • “Tree, and not a graph.” • Many limitations derive from here, and many complications in the XML processing languages. • Difficulty in modeling N:M relationships • The notion of reference (e.g. XLink, XPointer) not well integrated in the XML stack • “Duplication of concepts” • Many ways to do the same thing • Justification for a “simpler” data model like RDF • “Concepts that seem logically unnecessary” • PIs, comments, documents, etc • Additional complexity factors • xsi:nil, QName in content, etc • Not a complete “application server” (dev, depl., mang.)
Advantages and disadvantages 1. “Handles the dual aspect of information: lexical and binary” : 1 and “01” • Essential feature for the 21st century information management • E.g. XML-based contract to be used in a legal procedure • Lots of complexity derives from here • XML Schema deals with both lexical and binary constraints • XML Data Model has to include both the dm:typed-value and dm:string-value • Processing language like XQuery and XSLT have to define their semantics for both aspects • XML data storage and indexing heavily impacted
Advantages and disadvantages 2. “Data is context sensitive.” • We cannot do cut and paste in XML • Certain aspects of the data depend on the context where the fragment of data occurs (base-URIs, namespaces,etc) • Valuable feature for document management • Very hard consequences on storing, indexing and processing XML • Semantics of expressions also depends on the context where they appear • Additional consequences on expression evaluation
Sources of XML data ? • Inter-application communication data (WS, Rest, etc) • Mobile devices communication data • Logs • Blogs (RSS) • Metadata (e.g. Schema, WSDL, XMP) • Presentation data (e.g. XHTML) • Documents (e.g. Word) • Views of other sources of data • Relational, LDAP, CSV, Excel, etc. • Sensor data • It would be interesting to know the pie-chart and the evolution of each branch !
Some vertical application domains for XML • HealthCare Level Seven http://www.hl7.org/ • Geography Markup Language (GML) • Systems Biology Markup Language (SBML) http://sbml.org/ • XBRL, the XML based Business Reporting standard http://www.xbrl.org/ • Global Justice XML Data Model (GJXDM) http://it.ojp.gov/jxdm • ebXML http://www.ebxml.org/ • e.g. Encoded Archival Description Applicationhttp://lcweb.loc.gov/ead/ • Digital photography metadata XMP • An XML grammar for sensor data (SensorML) • Real Simple Syndication (RSS 2.0) Basically everywhere.
RosettaNet • http://www.rosettanet.org • Non-profit Organisation • Sponsors: IBM, Oracle, NEC, ... • More than 400 additional members • Goals: • Dynamic, flexible trading networks • Operational efficiency (cost reduction) • New business opportunities • Technical Goals: • Common language, standard processesfor sharing of electronic business information
RosettaNet • PIPs = Partner Interface Processes • 8 Clusters • Support • Partner Product and Service Review • Product Information • Order Management • Inventory Management • Marketing Information Management • Service and Support • Manufacturing • Segments wiht PIP Definitions in each Cluster
3 Order Management • Segment 3a: Quote and Order Entry • Segment 3b: Transportation and Distribution • Segment 3c: Returns and Finance • Segment 3d: Product Configuration
Example DTD <!-- RosettaNet XML Message Schema 3A1_MS_R02_00_QuoteRequest.dtd (16-Apr-2001 12:46) This document has been prepared by Edifecs (http://www.edifecs.com/) based On the Business Collaboration Framework from requirements in conformance with the RosettaNet methodology. --> <!ENTITY % common-attributes "id CDATA #IMPLIED" > <!ELEMENT Pip3A1QuoteRequest ( fromRole , GlobalDocumentFunctionCode , Quote , thisDocumentGenerationDateTime , thisDocumentIdentifier , toRole ) >
Example DTD (ctd.) <!ELEMENT fromRole ( PartnerRoleDescription ) > <!ELEMENT PartnerRoleDescription ( ContactInformation? , GlobalPartnerRoleClassificationCode , PartnerDescription ) > <!ELEMENT ContactInformation ( contactName , EmailAddress , facsimileNumber? , telephoneNumber ) > <!ELEMENT contactName ( FreeFormText ) > <!ELEMENT FreeFormText ( #PCDATA ) > <!ATTLIST FreeFormText xml:lang CDATA #IMPLIED >
Example DTD (ctd.) <!ELEMENT Quote ( comments? , financedBy? , GlobalGovernmentPriorityRatingCode? , GlobalQuoteTypeCode , governmentContractIdentifier? , PriceCondition? , QuoteCustomerInformation? , QuoteLineItem+ , quoteRequestIdentifier* , requestedResponseDate? , RequoteReference? , respondTo* , submittedDate? , TaxExemptStatus? , transportedBy? ) >
ebXML • http://www.ebxml.org • OASIS: Organization for the Advancement of Structured Information Standards • Non profit, ... (like RosettaNet) • Competition to RosettaNet • ebXML Mission: To provide an open XML based infrastructure enabling the global use of electronic business information in an interoperable, secure and consistent manner for all parties. • Uses XML Schema (not DTDs)
ebXML Example (SOAP) <SOAP:Envelope xmlns:SOAP="http://schemas.xmlsoap.org/…” xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schemas.xmlsoap.org/…”> <SOAP:Header xmlns:eb="http://www.oasis-open.org/…” xsi:schemaLocation="http://…”> <eb:MessageHeader ...> ... </eb:MessageHeader> </SOAP:Header> <SOAP:Body xmlns:eb="http://www.oasis-open.org/…” xsi:schemaLocation="http://…”> <eb:Manifest eb:version="2.0"> ... </eb:Manifest> </SOAP:Body> </SOAP:Envelope>
ebXML Header Info (u.a.) • Conversation ID <eb:ConversationID>2000-33-15-7</eb:ConversationID> • Sender and Recipient <eb:From> <eb:PartyId eb:type = „urn:duns“>123</eb:PartyId> <eb:PartyId eb:type = „SCAC“>RDWY</eb:PartyId> <eb:Role>http://rosettanet.org/roles/Buyer</eb:Role> </eb:From> <eb:To> <eb:PartId>mailto:joe@example.com</eb:PartId> <eb:Role>http://rosettanet.org/roles/Seller</eb:Role> </eb:To>