440 likes | 528 Views
XML, Java, and the future of the Web. CSE 597B Computational Issues in Ecommerce Sandip Debnath, Dr. C Lee Giles Dr. David Pennock Dr. Ingemar Cox Dr. Hongyuan Zha. Layout of the Presentation (XML, Java, and the future of the Web). Background (HTML, SGML) etc. XML Effort What is XML?
E N D
XML, Java, and the future of the Web CSE 597BComputational Issues in Ecommerce Sandip Debnath, Dr. C Lee Giles Dr. David Pennock Dr. Ingemar Cox Dr. Hongyuan Zha
Layout of the Presentation(XML, Java, and the future of the Web) • Background (HTML, SGML) etc. • XML Effort • What is XML? • Why XML? • How it can be used? • XML syntax, elements, attributes, validation, support, parsing,and displaying • Related concepts: CSS, XSL etc. • Advanced concepts: Namespace, CDATA, Encoding, Server etc.(will be discussed later) • XML applications and technologies • Java Effort • General Java Concept • Java for XML
Background (HTML, SGML)(XML, Java, and the future of the Web) • Most documents in Web are in HTML (which is based on SGML ISO 8879) • Problems in HTML: • Extensibility: HTML does not allow users to specify their own tags. • Structure: HTML does not support the specification of deep structures. • Validation: HTML specification does not allow consuming applications to check data for structural validity. • Structural looseness: HTML itself is not strict enough to impose structural integrity. • However SGML contains many optional features that are needed for Web applications which are tapped to create a new Markup Language, XML.
Birth of XML(XML, Java, and the future of the Web) • The first phase started in June ‘96, culminated in XML1.0, issued in Feb ‘98 • The second phase resulted in XML Namespaces (Jan ‘99) and Style Sheet Linking (June ‘99) • In Sep ’99, the third phase started to finish unfinished second phase and on XML query • XML protocol activity was launched in Sep ’00 • Working groups • Schema working group • Query working group • Linking working group • Core working group • Coordination group
What is XML anyway?(XML, Java, and the future of the Web) • XML stands for eXtensible Markup Language • XML is a markup language much like HTML. • XML was designed to describe data. • XML tags are not predefined in XML. You must define your own tags. • XML uses a DTD (Document Type Definition) to describe the data. • XML with a DTD is designed to be self-descriptive • Differs from HTML in the following way • Information providers can define new tag and attribute names at will • Document structures can be nested to any level of complexity • Any XML doc can contain optional description of its grammar for the consuming application to understand and validate the structural integrity.
Why XML?(XML, Java, and the future of the Web) • Differences from HTML tells the initial benefit of XML, and reasons behind its birth. • “XML Will • Enable internationalized media-independent electronic publishing • Allow industries to define platform-independent protocols for the exchange of data, especially the data of electronic commerce • Deliver information to user agents in a form that allows automatic processing after receipt • Make it easier to develop software to handle specialized information distributed over the Web • Make it easy for people to process data using inexpensive software • Allow people to display information the way they want it, under style sheet control • Make it easier to provide metadata -- data about information -- that will help people find information and help information producers and consumers find each other” --- W3C activity statement
Why XML is so important?(XML, Java, and the future of the Web) • Plain text: XML is not a binary format, so you can create and edit files with anything from a standard text editor to a visual development environment. That makes it easy to debug your programs, and makes it useful for storing small amounts of data. • Data Identification: XML tells you what kind of data you have, not how to display it. Because the markup tags identify the information and break up the data into parts, an email program can process it, a search program can look for messages sent to particular people, and an address book can extract the address information from the rest of the message. In short, because the different parts of the information have been identified, they can be used in different ways by different applications. • Stylability: When display is important, the Stylesheet Standard, XSL, lets you dictate how to portray the data. • Inline reusability:Unlike HTML, XML entities can be included "in line" in a document. The included sections look like a normal part of the document -- you can search the whole document at one time or download it in one piece. That lets you modularize your documents without resorting to links. You can single-source a section so that an edit to it is reflected everywhere the section is used, and yet a document composed from such pieces looks for all the world like a one-piece document. • Linkability:The XLink protocol is a proposed specification to handle links between XML documents. In general, the XLink specification targets a document or document-segment using its ID. The XPointer specification defines mechanisms for "addressing into the internal structures of XML documents", without requiring the author of the document to have defined an ID for that segment • Easily Processed: XML is a vendor-neutral standard, you can choose among several XML parsers, any one of which takes the work out of processing XML data. • Hierarchical:XML documents benefit from their hierarchical structure. Hierarchical document structures are, in general, faster to access because you can drill down to the part you need, like stepping through a table of contents. They are also easier to rearrange, because each piece is delimited. In a document, for example, you could move a heading to a new location and drag everything under it along with the heading, instead of having to page down to make a selection, cut, and then paste the selection into a new location.
How it can be used?(XML, Java, and the future of the Web) • <?xml version="1.0"?> • <customer-details id="AcPharm39156"> • <name>Acme Pharmaceuticals Co.</name> • <address country="US"> • <street>7301 Smokey Boulevard</street> • <city>Smallville</city> • <state>Indiana</state> • <postal>94571</postal> • </address> • </customer-details> • Matching start and end tags (must be followed, unlike HTML, it is strict here) • Element: A piece of information marked by tags • Attributes: (E.g. country=“US”) • Note the presence of nesting of tags
How it can be used? (contd)(XML, Java, and the future of the Web) XML is a low-level syntax for representing structured data. You can use this simple syntax to support a wide variety of applications (Following figure is taken from http://www.W3C.org)
How it can be used? (contd)(XML, Java, and the future of the Web) • XML can separate data from HTML • XML can be used to exchange data. • XML and B2B: it is going to be the main language for financial data exchange • XML can be used to share data. • XML can be used to store data • XML can be used to create new languages (WAP, WML)
XML syntax(XML, Java, and the future of the Web) • XML documents use a self describing (also creators responsibility)and simple syntax • <?xml version="1.0"?> • <note> • <to>Tove</to> • <from>Jani</from> • <heading>Reminder</heading> • <body>Don't forget me this weekend!</body> • </note> • XML documents must have a opening and a closing tag • XML tags are case sensitive • XML elements must be properly nested • XML elements must have a root tag • Values must be quoted • XML strips off unnecessary tabs, spaces • With XML CR/LF is always converted to LF
XML elements(XML, Java, and the future of the Web) • XML documents can be extended to carry more information • XML elements have relationship (parent-child etc.) • In the last slide “note” is the root element (a document must have a root element) • In the last slide “To”, ” From”, etc. are called children of the root and they are siblings to each other • Elements can have different content • Mixed • Simple • Attributes • Element naming rules • Names can contain letters, numbers, and other characters • Names must not start with a number or other punctuation characters • Names must not start with the letters xml (or XML or Xml ..) • Names cannot contain spaces
XML Attributes(XML, Java, and the future of the Web) • XML elements can have optionally attributes • <img src="computer.gif"> • <a href="demo.asp"> • Quote styles “demo.asp” or ‘demo.asp’ both are valid • Elements can be stored in either as elements or as attributes. Either the following • <person sex="female"> • <firstname>Anna</firstname> • <lastname>Smith</lastname> • </person> • Or • <person> • <sex>female</sex> • <firstname>Anna</firstname> • <lastname>Smith</lastname> • </person> • is valid.
XML Validation(XML, Java, and the future of the Web) • Well Formed XML: XML document which follows the XML syntax correctly • Valid XML: XML document which is Well Formed and also validated against the corresponding DTD. • You can define the corresponding DTD name inside a Well Formed XML document. • <?xml version="1.0"?> • <!DOCTYPE note SYSTEM "InternalNote.dtd"> • <note> • <to>Tove</to> • <from>Jani</from> • <heading>Reminder</heading> • <body>Don't forget me this weekend!</body> • </note>
XML Validation(contd.)(XML, Java, and the future of the Web) • DTD (Document Type Definition) :DTD defines the legal elements of an XML document.The purpose of a DTD is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements.A DTD can be defined inline in XML doc or as an external reference. • <?xml version="1.0"?> • <!DOCTYPE note [ • <!ELEMENT note (to,from,heading,body)> • <!ELEMENT to (#PCDATA)> • <!ELEMENT from (#PCDATA)> • <!ELEMENT heading (#PCDATA)> • <!ELEMENT body (#PCDATA)> • ]> • <note> • <to>Tove</to> • <from>Jani</from> • <heading>Reminder</heading> • <body>Don't forget me this weekend</body> • </note>
XML Validation-DTD(contd.)(XML, Java, and the future of the Web) The DTD above is interpreted like this:!DOCTYPE note (in line 2) defines that this is a document of the type note.!ELEMENT note (in line 3) defines the note element as having four elements: "to,from,heading,body".!ELEMENT to (in line 4) defines the to element to be of the type "#PCDATA".!ELEMENT from (in line 5) defines the from element to be of the type "#PCDATA"and so on… (PCDATA = Parsed Character DATA)
XML Validation-DTD(contd.)(XML, Java, and the future of the Web) <!ELEMENT XXX (AAA? , BBB+)> <!ELEMENT AAA (CCC? , DDD*)> <!ELEMENT BBB (CCC | DDD)> <!ELEMENT CCC (#PCDATA)> <!ELEMENT DDD (#PCDATA)>
XML Support(XML, Java, and the future of the Web) • Netscape has promised full XML support in its next browser. • IE 5.0 supports XML1.0 and the XML DOM (these are set by W3C). IE 5.0 has the following support. • Viewing of XML documents • Full support for W3C DTD standards • XML embedded in HTML as Data Islands • Binding XML data to HTML elements • Transforming and displaying XML with XSL • Displaying XML with CSS • Access to the XML DOM
XML Parsing(XML, Java, and the future of the Web) • The following are some of the well known XML parsers available in the market… • GNOME XML (Unix/Linux/Windows) • Library Oracle XML parser for Java (java) • XP (Java) • XML Validate (Java) • Xerces-C++ (Win32 (MSVC 6.0 compiler); Linux (RedHat 6.0), Unix ) • Oracle XML parser for C (Linux, Solaris 2.6 and NT 4 / Service Pack 3 (and above) ) • Lark (Java) • XML4cobol (Cobol) • XML parser for PL/SQL (Oracle 8i) • HEX (Java) • TcIXML (Tcl) • Xjparser (Java) • ActiveDOM (Active X) • Xmlproc (Python) • Xparse (Javascript) • Java Project X (Java) • SAX2 XML Utilities (Java) • Electric XML (Windows , Unix) • SAX ActiveX Control (Active X) • Xerces-J (Java) • CUEXml ActiveX (Active X) • Pull Parser (C++, Java)
XML Parsing(contd.)(XML, Java, and the future of the Web) • XML parser for C++ (C++) • DTDParser (Java (versions for windows, linux, unix) ) • XML::Parser (Perl) • Xerces-P (Perl) • XML4C (C ) • TinyXML (Java) • XML for Java (Java) • AElfred (Jaba) • XmlTree (VB) • XML Validator (C++, binary available for Windows and Linux-intel platforms. ) • XMLBooster (C, Cobol, Delphi, and Java.) • SP(c++) • JAXP (Java) • Larval (Java) • Markup (O’Caml) • Fxp (SML) • SXP : Silfide XML Parser • X-Fetch Performer (Windows) • Microsoft XML Parser • RXP (Unix, Win 32) • CUEXml Delphi (Delphi 3.0) • PHP XML Parser (PHP) • JSXML XML Tools (Javascript) • expat (C ) • Tony (Objective Caml 2.0 )
XML Parsing(contd.)(XML, Java, and the future of the Web) The XML Parsing details using Java needs some basic introduction to Java. The next few slides will talk about the new evolutionary programming language Java.
Java Efforts(XML, Java, and the future of the Web) • The Java Programming language has brought new concepts of • Platform independent, • 100% Object Oriented Methodology Supporting • programming language which also has other good features like • Automatic Garbage Collection, • Simple Pointer-less programming concepts and more. • No multiple inheritence • Huge number of APIs • Networking support • CGI look-alike Servlet classes and • Support to traditional programming as well as to new industry trends.
Java Efforts (contd.)(XML, Java, and the future of the Web) • Some of the products under the Java umbrella are… • JavaTM 2 Platform, Standard Edition (J2SETM )The essential Java 2 SDK, tools, runtimes, and APIs for developers writing, deploying, and running applets and applications in the Java programming language. Also includes earlier Java Development Kit versions JDKTM 1.1 and JRE 1.1 • JavaTM 2 Platform, Enterprise Edition (J2EETM) • Combines a number of technologies in one architecture with a comprehensive Application Programming Model and Compatibility Test Suite for building enterprise-class server-side applications. • JavaTM 2 Platform, Micro Edition (J2METM) • A highly optimized Java runtime environment targeting a wide range of consumer products, including pagers, cellular phones, screenphones, digital set-top boxes and car navigation systems. • Consumer & Embedded Technologies & Products • The Java Consumer and Embedded technologies and products let you write code for small devices that are big on functionality but short on resources.
Java Efforts (contd.)(XML, Java, and the future of the Web) • COMPLETE PRODUCT LIST (by product group) • Java 2 Platform, Standard Edition Product FamilySoftware Development Kits & Runtimes • JavaTM 2 SDK, Standard Edition, v 1.3 • JavaTM 2 SDK, Standard Edition, v 1.2.2 • JavaTM 2 SDK, Standard Edition, Source Release • JavaTM 2 Runtime Environment, Standard Edition, v 1.2.2 • JavaTM Plug-in • JavaTM Web Start • Java Development Kit (JDKTM) 1.1.8 (JDK 1.1.8) • JavaTM Runtime Environment 1.1.8 (JRE 1.1.8) • JDKTM Japanese Supplement 1.1.x Related Products • JavaBeansTM Development Kit (BDK) • Java HotSpotTM Server Virtual Machine • Application Programming Interfaces (APIs)- Core to Java 2 platform • Collections Framework • JavaTM Foundation Classes (JFC) • Swing Components • Pluggable Look & Feel • Accessibility • Drag and Drop • Security • JavaTM IDL • JDBCTM • JavaBeansTM • Remote Method Invocation (RMI) • Java 2DTM
Java Efforts (contd.)(XML, Java, and the future of the Web) • Java 2 Platform, Enterprise EditionTechnologies • Enterprise JavaBeansTM Architecture • JavaServer PagesTM • JavaTM Servlet • Java Naming and Directory InterfaceTM (JNDI) • JavaTM IDL • JDBCTM • JavaTM Message Service (JMS) • JavaTM Transaction (JTA) • JavaTM Transaction Service (JTS) • JavaMail • RMI-IIOP • Software Development Kit & Application Model • Java 2 SDK, Enterprise Edition • Sun BluePrintsTM Design Guidelines for J2EE
Java Efforts (contd.)(XML, Java, and the future of the Web) • Consumer & Embedded Technologies & ProductsTechnologies • Java 2 Platform, Micro Edition (J2METM technology) • Connected Device Configuration (CDC) • Connected Limited Device Configuration (CLDC) • C Virtual Machine (CVM) • K Virtual Machine (KVM) • PersonalJavaTM Application Environment • PersonalJavaTM Technology, Source Edition • EmbeddedJavaTM Application Environment • EmbeddedJavaTM Technology, Source Edition • Java CardTM • JavaPhoneTM API • Java TVTM API • JiniTM Network Technology • Mobile Information Device Profile (MIDP) • Products • Personal ApplicationsTM Suite • Java Dynamic ManagementTM Kit • Java Embedded ServerTM Software
Java Efforts (contd.)(XML, Java, and the future of the Web) • Optional Packages • Optional Packages define APIs that extend the core Java platform API. • Forte FusionTM • ForteTM for JavaTM • HotJavaTM Product Family • The JAINTM APIs — JAINTM TCAP — JAINTM OAM • Java BlendTM • JavaCheckTM • JavaTM Electronic Commerce Framework • JavaTM Internationalization & Localization Toolkit 2.0 • JavaTM Message Queue • JavaServerTM Product Family • JavaTM Shared Data Toolkit • JavaSpacesTM • JavaTM Speech API • JavaTM Telephony API (JTAPI) • JiniTM Network Technology • JiroTM Technology • OSS through JavaTM Initiative
XML Parsing(contd.)(XML, Java, and the future of the Web) • There are two main types of parsing of XML available in these parsers… • SAX or Simple API for XML • DOM or Document Object Model • The Java SAX Parser API structure is as shown here (Taken from Sun’s Java site)
XML Parsing(contd.)(XML, Java, and the future of the Web) The Java DOM Parser API structure is as shown here (Taken from Sun’s Java site)
XML Parsing(contd.)(XML, Java, and the future of the Web) • When to use SAX and When to use DOM ? • SAX: • If the information stored in your XML documents is machine readable (and generated) data then SAX is the right API for giving your programs access to this information. Machine readable and generated data include things like: • Java object properties stored in XML format • queries that are formulated using some kind of text based query language (SQL, XQL, OQL) • result sets that are generated based on queries (this might include data in relational database tables encoded into XML). • So machine generated data is information that you normally have to create data structures and classes for in Java. A simple example is the address book which contains information about persons, as shown in Figure 1. This address book XML file is not like a word processor document, rather it is a document that contains pure data, which has been encoded into text using XML.
XML Parsing(contd.)(XML, Java, and the future of the Web) • When to use SAX and When to use DOM ? • SAX: • When your data is of this kind, you have to create your own data structures and classes (object models) anyway in order to manage, manipulate and persist this data. SAX allows you to quickly create a handler class which can create instances of your object models based on the data stored in your XML documents. An example is a SAX document handler that reads an XML document that contains my address book and creates an AddressBook class that can be used to access this information. The first SAX tutorial shows you how to do this. The address book XML document contains person elements, which contain name and email elements. My AddressBook object model contains the following classes: • AddressBook class, which is a container for Person objects • Person class, which is a container for name and email String objects. • So my "SAX address book document handler" is responsible for turning person elements into Person objects, and then storing them all in an AddressBook object. This document handler turns the name and email elements into String objects.
XML Parsing(contd.)(XML, Java, and the future of the Web) When to use SAX and When to use DOM ? DOM: If your XML documents contain document data (e.g., Framemaker documents stored in XML format), then DOM is a completely natural fit for your solution. If you are creating some sort of document information management system, then you will probably have to deal with a lot of document data. An example of this is the Datachannel RIO product, which can index and organize information that comes from all kinds of document sources (like Word and Excel files). In this case, DOM is well suited to allow programs access to information stored in these documents. However, if you are dealing mostly with structured data (the equivalent of serialized Java objects in XML) DOM is not the best choice. That is when SAX might be a better fit.
XML Displaying(XML, Java, and the future of the Web) To display XML document you can add CSS (Cascade Style Sheet) files for all necessary styles. <?xml version="1.0"?> <?xml-stylesheet type="text/css" href="cd_catalog.css"?> <CATALOG> <CD> <TITLE>Empire Burlesque</TITLE> <ARTIST>Bob Dylan</ARTIST> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia</COMPANY> <PRICE>10.90</PRICE> <YEAR>1985</YEAR> </CD> <CD > … </CD>
XML Displaying (contd.)(XML, Java, and the future of the Web) The CSS file may look like this… CATALOG { background-color: #ffffff; width: 100%; } CD { display: block; margin-bottom: 30pt; margin-left: 0; } TITLE { color: #FF0000; font-size: 20pt; } ARTIST { color: #0000FF; font-size: 20pt; } COUNTRY,PRICE, { Display: block; color: #000000; margin-left: 20pt; } YEAR,COMPANY { Display: block; color: #00FF00; margin-left: 20pt; }
XML Displaying (contd.)(XML, Java, and the future of the Web) The output will look like… Empire Burlesque Bob Dylan USA Columbia 10.90 1985
The Related Concepts/buzzwords(XML, Java, and the future of the Web) • CSS: Cascade Style Sheet • XSL: eXtensible Style Sheet Language • XSLT (+XPATH): Extensible Stylesheet Language for Transformations • RELAX: Regular Language description for XML • SOX: Schema for Object-oriented XML • TREXTree Regular Expressions for XML • Schematron Schema for Object-oriented XML • RDF: Resource Description Framework • XTM: XML Topic Maps • SMIL: Synchronized Multimedia Integration Language • MathML: Mathematical Markup Language • DrawML:Drawing Meta Language • ICE:Information and Content Exchange • ebXML:Electronic Business with XML • Cxml: Commerce XML • CBL: Common Business Library
The Advanced Concepts(XML, Java, and the future of the Web) The Namespace, CDATA, Encoding, Server etc. (will be discussed later)
XML Applications/technologies(XML, Java, and the future of the Web) • The following types of applications are driving the XML… • Applications that require web clients to mediate between two or more heterogeneous databases. • Applications that attempt to distribute a significant portion of the processing load from Web server to the Web client. • Applications that require the Web client to present different views of the same data to different users. • Applications in which intelligent Web agents attempt to tailor information discovery to the needs of individual users.
A small XML application(1)(XML, Java, and the future of the Web) • 1) First we start with a simple XML document. • Take a look at our original demonstration document, the CD catalog. • <?xml version="1.0"?> • <CATALOG> • <CD> • <TITLE>Empire Burlesque</TITLE> • <ARTIST>Bob Dylan</ARTIST> • <COUNTRY>USA</COUNTRY> • <COMPANY>Columbia</COMPANY> • <PRICE>10.90</PRICE> • <YEAR>1985</YEAR> • </CD> . . ... more ... . • The full file is here
A small XML application(2)(XML, Java, and the future of the Web) 2) Load the document into a Data Island A Data Island can be used to access the XML file. To get your XML document "inside" an HTML page, add an XML Data Island to the page. <xml src="cd_catalog.xml" id="xmldso" async="false"> </xml> With the example code above, the XML file "cd_catalog.xml" will be loaded into an "invisible" Data Island called "xmldso". The async="false“ attribute is added to the Data Island to make sure that all the XML data is loaded before any other HTML processing takes place.
A small XML application(3)(XML, Java, and the future of the Web) 3) Bind the Data Island to an HTML Table An HTML table can be used to display the XML data. To make your XML data visible on your HTML page, you must "bind" your XML Data Island to an HTML element. To bind your XML data to an HTML table, add a data source attribute to the table, and add data field attributes to <span> elements inside the table data: <table datasrc="#xmldso" width="100%" border="1"> <thead> <th>Title</th> <th>Artist</th> <th>Year</th> </thead> <tr align="left"> <td><span datafld="TITLE"></span></td> <td><span datafld="ARTIST"></span></td> <td><span datafld="YEAR"></span></td> </tr></table>
A small XML application(4)(XML, Java, and the future of the Web) 4) Bind the Data Island to <span> or <div> elements <span> or <div> elements can be used to display XML data. You don't have to use a table to display your XML data. Data from a Data Island can be displayed anywhere on an HTML page. All you have to do is to add some <span> or <div> elements to your page. Use the data source attribute to bind the elements to the Data Island, and the data field attribute to bind each element to an XML element, like this: <br />Title: <span datasrc="#xmldso" datafld="TITLE"></span> <br />Artist: <span datasrc="#xmldso" datafld="ARTIST"></span> <br />Year: <span datasrc="#xmldso" datafld="YEAR"></span>
A small XML application(5)(XML, Java, and the future of the Web) 5) Add a Navigation Script to your XML Navigation has to be performed by a script. To add navigation to the XML Data Island, create a script that calls the movenext() and moveprevious() methods of the Data Island. <script type="text/javascript"> function movenext() { x=xmldso.recordset if (x.absoluteposition < x.recordcount) { x.movenext() } } function moveprevious() { x=xmldso.recordset if (x.absoluteposition > 1) { x.moveprevious() } } </script>
Title Artist Country Company Price Year Empire Burlesque Bob Dylan USA Columbia 10.90 1985 Hide your heart Bonnie Tyler UK CBS Records 9.90 1988 Greatest Hits Dolly Parton USA RCA 9.90 1982 A small XML application(6)(XML, Java, and the future of the Web) The result looks like this…