270 likes | 339 Views
Introduction to XML. Marek Podgorny and Lukasz Beca EECS SU and CollabWorx, Inc. Syracuse University Fall 2002. Markup Languages. Marking up text is a methodology for encoding data with information about itself Yellow highlighter is a valid markup methodology
E N D
Introduction to XML Marek Podgorny and Lukasz Beca EECS SU and CollabWorx, Inc. Syracuse University Fall 2002
Markup Languages • Marking up text is a methodology for encoding data with information about itself • Yellow highlighter is a valid markup methodology • You decide which part of the document are important • It is portable – others can benefit from your markup • Two critical properties on a valid markup: • A standard must be in place to define what a valid markup is • Above, markup is defined as a bit of yellow ink atop text • In HTML a markup is a <font color=yellow>tag</font> • A standard must be in place to define what markup means • Yellow highlight means the highlighted text represents an important point • In HTML each tag carries a well-defined formatting instruction CPS606, Fall 2002, EECS SU & CollabWorx
What is XML? • Like HTML, XML (Extensible Markup Language) is a markup language which relies on the concept of rule-specifying tags and the use of a tag-processing application that knows how to deal with the tags • For HTML, the application is a browser • This is because HTML is a presentation markup • For XML, the application can by anything • XML may be processed by browsers, but its application domain is huge and not even completely understood today CPS606, Fall 2002, EECS SU & CollabWorx
eXtensibility of XML • The most important technical difference between XML and HTML is that while HTML is a closed set of tags, XML is a meta-language for defining other markup languages • XML specifies the standards with which you can define your own markup languages with their own sets of tags • This very statement makes people nervous… • We will discuss methodology to define a new language but in practice very few people will ever write a DTD CPS606, Fall 2002, EECS SU & CollabWorx
Made-up Markup Language (MuML) • <CONTACT> • <NAME>Kim Smith</NAME> • <ID>027</ID> • <COMPANY>WebtopSystems Inc.</COMPANY> • <EMAIL>kim@webtopsystems.com</EMAIL> • <PHONE>315 443-4868</PHONE> • <STREET>111 College Pl</STREET> • <CITY>Syracuse</CITY> • <STATE>New York</STATE> • <ZIP>13244</ZIP> • </CONTACT> This is a chunk of valid XML. How is it useful? Netscape browser surely doesn’t know what to do with it…. CPS606, Fall 2002, EECS SU & CollabWorx
How to make MuML useful? • There must be a set of rules allowing us/computer to understand syntax of the language • In XML, this information is provided to processing application by Document Type Definition (DTD) • The DTD specifies what it means to be a valid tag - the syntax for marking up • There must be a set of rules defining the meaning (semantics) of the markup • To specify what valid tags mean, XML documents are also associated with style sheets which provide GUI instructions for a processing application like a web browser. • Note that other application domains of XML might do w/o a style sheet – e.g., application using XML a object serialization technique CPS606, Fall 2002, EECS SU & CollabWorx
Anytime you see a <CONTACT>, display it using a <UL> tag. </CONTACT> tags should be converted to </UL> All <NAME> tags can be substituted for <LI> tags and </NAME> tags should substituted for </LI> All <EMAIL> tags can be substituted for <LI> tags and </EMAIL> tags should be ignored Style sheet utilizes the functionality of HTML to define the formatting of MuML. For non-browser apps, the HTML translation is irrelevant Processing application combines the logic of the style sheet, the DTD, and the data of the MuML document, and displays it according to the rules and the data. So instead of a simple HTML we got three different chunks. Why the pain? Style Sheet Pseudo-Code CPS606, Fall 2002, EECS SU & CollabWorx
Complex XML World • We need a processing agent which will put together the DTD, the style sheet, and the data • Note Web browsers barely up to the task yet • Formal definition: • "A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application." • And this is not yet all…. CPS606, Fall 2002, EECS SU & CollabWorx
Build your own ColdFusion? • XML allows each specific industry to develop its own tag sets to meet its unique needs • Doesn’t force everyone's browser to incorporate zillions of tag sets, or developers to settle for a tag set that is too generic to be useful • Compelling? Well… • The real power of XML: • Not only can you define your own set of tags, but the rules specified by those tags are not limited to formatting rules • XML allows you to define all sorts of tags with all sorts of rules • tags representing business rules or tags representing data description or data relationships. • As these tags are reflected in DOM, you can do computation on documents! CPS606, Fall 2002, EECS SU & CollabWorx
Why are HTML days counted? • The GUI is embedded in the data. • What happens if you decide that you like a table-based presentation better than a list-based presentation? • Searching for information in the data is tough • The data is tied to the logic and language of HTML and hence to browsers • What if I want to use my data in a Java applet? HTML: <LI>State: Ohio <LI>State: Oregon XML: <state>Ohio</state> <state>Oregon</state> • How do I find all records for Ohio • What is relationship of Ohio and Oregon? CPS606, Fall 2002, EECS SU & CollabWorx
HTML Search in Action CPS606, Fall 2002, EECS SU & CollabWorx
Long Live XML! • With XML, the GUI and data are divorced • Thus, changes to display do not require messing with the data - a separate style sheet will specify a table display or a list display • Searching the data is easy and efficient • Search engines can parse description-bearing tags rather than muddling in the data. Tags provide them with the intelligence they otherwise lack • Complex relationships (trees, inheritances, classes) can be communicated • The code is much more legible to a lay person - • It is obvious that <ID>911</ID> represents an ID whereas <LI>911 might not. XML is self-describing CPS606, Fall 2002, EECS SU & CollabWorx
Why isn’t it there if it is so good? • No XML applications… • IE 5.0 provides some support for XSL and XML if output is HTML • Netscape 5.0 (Mozilla) also implements support for XML but not for XSL • A quote: “XML isn't about display -- it's about structure. This has implications that make the browser question secondary. So the whole issue of what is to be displayed and by what means is intentionally left to other applications. You can target the same XML (with different XSL) for different devices (standard web browser, palm pilot, printer, etc.). You should not get the impression that XML is useless until browsers support it. This is definitely not true -- we are using it at NASA in ways where no browser plays any role." - Ken Sall, NASA IT Manager CPS606, Fall 2002, EECS SU & CollabWorx
XML Design Goals • Enable better search algorithms (metadata) • Enable presentation of various views for same data • Integrate data from different sources • Provide easy use over the Internet • Create documents readable even by humans • Support data interchange • Enable easy development of document processing applications CPS606, Fall 2002, EECS SU & CollabWorx
XML - Summary • Extensible Markup Language - Subset of Standard Generalized Markup Language (SGML) • Universal format for describing structured data on the Web • Specification developed by World Wide Web Consortium (W3C) supervised by XML Working Group CPS606, Fall 2002, EECS SU & CollabWorx
Applications of XML • XML languages • XML protocols • Support for XML • Client side • Server side • XML and databases • Data interchange CPS606, Fall 2002, EECS SU & CollabWorx
XML Deployment • XML is a basis for development of industry language and protocol standards • Corporations and academic organizations form special organizations (consortiums or forums) in order to develop standards for whole branches of industry. Example: World Wide Web Consortium or WAPForum CPS606, Fall 2002, EECS SU & CollabWorx
Extensible HyperText Markup Language (XHTML) • XML based syntax • Extensibility through XHTML modules allow the combination of existing and new feature sets when developing content and when designing new user agents (web browsers, portable devices, etc.) • Examples of modules: • required modules: structure, basic text, hypertext, lists • optional modules: presentation, forms, tables, images, stylesheets, applets, frames, etc. • XHTML is designed with general user agent interoperability in mind, XHTML documents should be displayed on any type of XHTML-compliant devices • Current version - XHTML™ 1.0, DTD specification available at http://www.w3.org site CPS606, Fall 2002, EECS SU & CollabWorx
Synchronized Multimedia Integration Language (SMIL) • SMIL allows developers to mix media presentation to be presented and synchronized with each other • For example, the SMIL document can specify: • the positioning where the visual content appears in player • when audio or video (or other type of stream) starts and stops playing • Users need a special player to view the SMIL documents • Products supporting SMIL: Real Networks - Realplayer, Apple - QuickTime • See: http://www.empirenet.com/~joseram/smil_intro/smil_intro.html for tutorial about SMIL written in SMIL • Current version - SMIL 1.0, Specification available at http://www.w3.org site CPS606, Fall 2002, EECS SU & CollabWorx
Wireless Application Protocol (WAP) and Wireless Markup Language (1) • Forecasted users of wireless services by 2001 - 530 million • Currently used and available in the future devices have multimedia capabilities: receiving/sending e-mail, accessing Internet • Wireless Application Protocol - standard for the presentation and delivery of wireless information and telephony on mobile phones and other wireless terminals • handset manufacturers that represent 90 percent of world market support this standard • Wireless Markup Language (WML) - part of the standard, designed to describe information to be presented on small displays • WML documents can be accessed over the Internet using standard HTTP protocol • traditional servers can be used for hosting WML documents CPS606, Fall 2002, EECS SU & CollabWorx
Simple Object Access Protocol (SOAP) • Support for Remote Procedure Call and messaging mechanisms over various protocols (for example, HTTP). implemented in XML • Describes conventions for definition of: • method calls • method parameters • results of method calls • serialization mechanisms for encoding application-defined data types • Since SOAP messages can be transported over HTTP protocol, currently deployed Web infrastructure becomes one distributed computing platform (distributed objects can be placed on HTTP servers) • Current version - SOAP 1.1 (status: note), Specification available at http://www.w3.org site CPS606, Fall 2002, EECS SU & CollabWorx
Support for XML in Web Browsers • Internet Explorer 5.0+ • Extensible Markup Language • Extensible Stylesheet Language • Cascading Stylesheets • Document Object Model • Data Islands • Mozilla 5.0 • Extensible Markup Language • Cascading Stylesheets • Document Object Model • Graphical User Interface built using XUL (Extensible User Interface Language) - users can provide their own user interface documents to customize layout of the browser • Microbrowsers for portable devices • Wireless Markup Language CPS606, Fall 2002, EECS SU & CollabWorx
Support for XML on Server Side • Web servers can host XML documents • XML documents can be dynamically generated by servlets, JSP pages, and ASP pages • XML adapters allow translation from application specific formats to XML • XML documents can be stored in databases for fast retrieval • Enterprise applications with XML processing functionality can be easily built using available XML parser components and XSL processors CPS606, Fall 2002, EECS SU & CollabWorx
XML Document and Database (1) • Information stored in database Part Name Part ID Price InStock window 001 40$ yes muffler 002 150$ yes door 003 30$ no CPS606, Fall 2002, EECS SU & CollabWorx
XML Document and Database (2) • The same information represented as an XML document <store> <part id=“p001”> <part-name>window</part-name> <price>40</price> <instock>yes</instock> </part> <part id=“p002”> <part-name>muffler</part-name> <price>150</price> <instock>yes</instock> </part> </store> CPS606, Fall 2002, EECS SU & CollabWorx
Data Interchange • One of the most costly aspect of Enterprise Application Integration - conversion of proprietary data formats to other data formats • XML - new data interchange standard • Information handled by different applications and data sources can be converted into XML to provide uniform data format • Using XML • applications can exchange data easily • application specific data can be used on the Internet CPS606, Fall 2002, EECS SU & CollabWorx