510 likes | 801 Views
XML and Web Services. Outline. Why XML? An Introduction to XML Web Services. Why XML?. What’s Wrong with HTML?. HTML (Hypertext Markup Language) was developed by Tim Berners-Lee in 1992 as a simplified version of SGML (Standard Generalized Markup Language).
E N D
XML and Web Services CSI 5389 (E-Commerce Technologies)
Outline • Why XML? • An Introduction to XML • Web Services CSI 5389 (E-Commerce Technologies)
Why XML? CSI 5389 (E-Commerce Technologies)
What’s Wrong with HTML? • HTML (Hypertext Markup Language) was developed by Tim Berners-Lee in 1992 as a simplified version of SGML (Standard Generalized Markup Language). • Simple language, well suited for hypertext, multimedia, and the display of small and reasonably simple documents. • SGML is a standard language for defining and using document formats (ISO 8879). • Too complicated to understand and to use (accessible only to experts). • Although HTML is workable for simple document, it mixes up the ideas of the structure of a document and the display of that document. CSI 5389 (E-Commerce Technologies)
What’s Wrong with HTML (cont.)? • HTML has been extended in disorganized and incompatible ways by Netscape and Microsoft. • To compete with each other, these two companies have added their own HTML tags, and implemented different interpretations of the same tags. • Many Web sites today contain tagging that is written for a specific browser. • These Web pages will work properly only with their intended specific browser (and therefore not work properly with other browsers). CSI 5389 (E-Commerce Technologies)
What’s Wrong with HTML (cont.)? • In addition, there are also other limitations: • Extensibility: HTML does not allow users to specify their own tags or attributes in order to parameterize or semantically qualify their data. • Structure: HTML does not support the specification of deep structures needed to represent database schemas or object-oriented hierarchies. • Validation: HTML does not support the kind of language specification that allows consuming applications to check data for structural validity on importation. CSI 5389 (E-Commerce Technologies)
The XML Effort • XML (Extensible Markup Language) was developed starting in 1996 by a working group of the W3C (World Wide Web Consortium). • XML is a standardized language to represent structured data as text files. • XML advantages: • XML provides strong separation of the structure of a document and the display of that document. • Information providers can define new tags and attributes at will. • Document structures can be nested to any level of complexity. • Any XML document can contain an optional description of its grammar for use by applications that need to perform structural validation. CSI 5389 (E-Commerce Technologies)
The Main Point • By defining our own markup language, we can encode the information of our documents much more precisely than it is possible with HTML. • This means that programs processing these documents can “understand” them much better and therefore can process the information in ways that are impossible with HTML. • Example: Imagine that we mark up recipes (say, for sea food dishes) according to some definition where we enter the amounts of ingredients needed for making each dish. • We can write a program that, given a list of contents in our fridge, would go through the list of recipes and make a list of the dishes we could make with the available ingredients. • Given nutritional information about the ingredients, the program could sort the dishes by the amount of calories in each dish. • Given the price information for the ingredients, the program could sort the dishes by the price of each dish, and so on. • The possibilities are almost endless, because the information is encoded in a way that the computer can “understand”. CSI 5389 (E-Commerce Technologies)
Web Applications of XML • The applications that need XML are those that cannot be accomplished within the limitations of HTML. These applications can be divided into 4 categories: • Applications that require the Web client to mediate between two or more heterogeneous databases. • Applications that attempt to distribute a significant proportion of the processing load from the Web server to the Web client. • Applications that require the Web client to present different views of the same data to different users. • Applications in which intelligent Web agents attempt to tailor information discovery to the needs of individual users. CSI 5389 (E-Commerce Technologies)
Web Applications of XML: An Example • Let’s consider a typical example of the first category of XML applications: the information tracking system for a home health care agency. • A patient entering a home health care agency is represented to the information system by a large collection of paper-based materials of the patient’s medical histories. • The major task in accepting the patient into the system is the manual entry of these materials into the agency’s database. CSI 5389 (E-Commerce Technologies)
Web Applications of XML: An Example (cont.) • First solution (commonly used in practice): • Log into the hospital’s Web site. • Become an authorized user. • Access the patient’s medical records using a Web browser. • Print out the records from the Web browser. • Manually key in the data from the printouts. • Second solution (slightly better): • Instead of printing out the patient’s medical records, the operator reads the records from the Web browser and directly key the data into the agency’s online forms in a separate window. • This solution saves the paper that would have been needed for the printouts, but does nothing to address the root of the problem. CSI 5389 (E-Commerce Technologies)
Web Applications of XML: An Example (cont.) • Desired solution: • Log into the hospital’s Web site. • Become an authorized user. • Access the patient’s medical records in a Web-based interface that represents the patient’s records as a folder icon. • Drag the folder from the Web application over to the internal database application. • Drop the folder into the database. • This solution is not possible within the limitations of HTML, for three reasons: • The HTML tag set is too limited to represent or identify multiple database fields in the mixture of the medical documents. • HTML is incapable of representing the variety of structures in those documents. • HTML does not have any mechanism to check data for structural validity before the application attempts to import the data into the target database. CSI 5389 (E-Commerce Technologies)
Web Applications of XML: An Example (cont.) • One technically feasible solution is to require all hospitals and health care agencies to use a single standard system dictated by the government. • However, in an environment where many health care agencies and hospitals are in financial difficulty, it is hardly practical to require them to replace their existing heterogeneous systems with a single new system. • The other way to enable interchange between heterogeneous systems is to adopt a single industry-wide interchange format that serves as the single output format for all exporting systems, and as the single input format for all importing systems. • In other words, we need a standard language to export and import data: XML CSI 5389 (E-Commerce Technologies)
An Introduction to XML CSI 5389 (E-Commerce Technologies)
XML: A Simple Example • <?xml version=“1.0”?> • <Address> • <Name> Larry Stewart </Name> • <Street> 11 Serissa Circle </Street> • <City> Wayland </City> • <State> MA </State> • <Zip> 01778 </Zip> • </Address> • The above XML fragment contains an address in the U.S. • We are free to define new tags such as <Name>, <Street>, etc. to identify parts of the address. • This arrangement makes XML very easy for disparate software tools to create and use. CSI 5389 (E-Commerce Technologies)
Well Formed and Valid XML Documents • An XML document is said to be well formed if it has correct syntax, and is said to be valid if it specifies a document type definition (DTD) and complies with the constraints expressed in that DTD. • If an XML document is well formed and valid, an XML parser will be able to process it. • A DTD is a schema for a class of XML documents, appropriate for a given domain. • DTD acts as a rule book that allows authors to create new documents with the same characteristics as the base document • XML provides strong separation of the structure of a document and the display of that document. • The structure is encoded in XML, while the display is managed by the Extensible Style-sheet Language (XSL). CSI 5389 (E-Commerce Technologies)
XML Entities • Elements • Attributes CSI 5389 (E-Commerce Technologies)
XML Elements • XML elements are similar to records in a programming language. • An element declaration has the following form: • <!ELEMENT ElementName (ElementContents)> • This declaration defines the relationships among the elements, the order of occurrences of the elements, and their number of occurrences. CSI 5389 (E-Commerce Technologies)
XML Elements (cont.) • If an element X consists of elements A, B, and C in that order, then this would be declared as follows: <!ELEMENT X (A, B, C)> • If the elements A, B, and C can appear in any order, then "&" is used in place of ",". • If only one among A, B, or C is used, then the declaration is <!ELEMENT X ( A | B | C )> • If element X consists of zero or more As, and one or more Bs, then the declaration is <!ELEMENT X ( A*, B+ )> • A question mark after an element means that the element can be skipped: <!ELEMENT X ( A, B?, C? )> • Note that elements can be nested. CSI 5389 (E-Commerce Technologies)
XML Element Types • #PCDATA Parsed character data: The element content contains data which the XML parser can search to look for tags or entity declarations. • ANY Character data: The element content can contain any element defined in any order. Data is not parsed. • EMPTY The element content contains no data. CSI 5389 (E-Commerce Technologies)
XML Attributes • Attribute declarations describe information about an element. • More than one attribute can be defined for one element. • Attributes are contained within the start tag of an element. They are defined as follows: <!ATTLIST ElementName AttributeName1 DeclaredValue1 DefaultValue1 AttributeName2 DeclaredValue2 DefaultValue2 ... AttributeNameN DeclaredValueN DefaultValueN > • Declared value is either a list of permissible values, or one of the pre-defined data types. • Default value specifies which value must or may be present as the default value. CSI 5389 (E-Commerce Technologies)
XML Attributes: Declared Value Types • CDATA Character data: Characters other than the attribute value delimiters such as ( _ ‘ ) can be used. • NMTOKEN The value must conform with the rules for an XML name. In general, it must start with a letter and be followed by any combination of letters, digits, or a few special characters. No spaces are allowed. • NMTOKENS One or more NMTOKEN separated by spaces. CSI 5389 (E-Commerce Technologies)
XML Attributes: Declared Value Types (cont.) • ID Identifier: The value of this attribute is unique for each element. • IDREF The value of this attribute matches the value of some ID attribute of an element in the same XML document. It is used to point to that element. • IDREFS One or more IDREF separated by spaces. CSI 5389 (E-Commerce Technologies)
XML Attributes: Default Value Types • #REQUIRED Some value must be specified for this attribute. • #IMPLIED When an attribute with this default value is not specified, the application uses the pre-determined attribute value. • 'value' The 'value’ specified is the default. Other permissible values may also be used. • #FIXED 'value' The value must and can only be the value specified. CSI 5389 (E-Commerce Technologies)
XML Example: FAQ Document <?xml version=“1.0”?> <!DOCTYPE FAQ SYSTEM http://www.server.com/DTDs/faq.dtd> <FAQ> <INFO> <SUBJECT> XML </SUBJECT> <AUTHOR> Lars Marius Garshol </AUTHOR> <EMAIL> larsga@ifi.io.no </EMAIL> <VERSION> 1.0 </VERSION> <DATE> June 20 2005 </DATE> </INFO> <PART NO=“1”> <Q NO=“1”> <QTEXT> What is XML? </QTEXT> <A> Simplified SGML. </A> </Q> <Q NO=“2”> <QTEXT> What can I use it for? </QTEXT> <A> Anything. </A> </Q> </PART> </FAQ> Accessing DTD Element and tags Attribute CSI 5389 (E-Commerce Technologies)
XML Abstract Syntax Tree FAQ PART INFO Q Q VERSION SUBJECT EMAIL DATE AUTHOR A A QTEXT QTEXT CSI 5389 (E-Commerce Technologies)
DTD for the FAQ System (faq.dtd) <?xml version=“1.0”?> <!ELEMENT FAQ (INFO, PART+)> <!ELEMENT INFO (SUBJECT, AUTHOR, EMAIL?, VERSION?, DATE?)> <!ELEMENT SUBJECT (#PCDATA)> <!ELEMENT AUTHOR (#PCDATA)> <!ELEMENT EMAIL (#PCDATA)> <!ELEMENT VERSION (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT PART (Q+)> <!ELEMENT Q (QTEXT, A)> <!ELEMENT QTEXT (#PCDATA)> <!ELEMENT A (#PCDATA)> <!ATTLiST PART NO CDATA #IMPLIED TITLE CDATA #IMPLIED> <!ATTLIST Q NO CDATA #IMPLIED> CSI 5389 (E-Commerce Technologies)
Linking in XML • XML links can be between two or more resources, which can be either files (not necessary XML or HTML files) or elements in files. • Linking is an element with attributes: <!ELEMENT simplink ANY> <!ATTLIST simplink ACTUATE (AUTO|USER) “USER” SHOW (REPLACE|EMBED|NEW) “REPLACE” … > • Links can be specified with the ACTUATE attribute to be followed • either when the user explicitly makes a request for instance by clicking (if the value is USER), or • automatically when the system reads the linking (if the value is AUTO). CSI 5389 (E-Commerce Technologies)
Linking in XML (cont.) • What happens when following a link specified with the SHOW attribute, which can take the following values: • EMBED This means that the resource the link points to is to be inserted into the document. • REPLACE This means that the resource the link points to is to be replacing the linking element. (Hence, if you have two different versions of a paragraph, you can link them in such a way that one can see the other version in the same context by following the link.) • NEW This means that the resource the link points to will be processed or displayed in a new context (e.g., a new page). Ordinary HTML links are of type NEW as the new page is displayed in place of the previous one. CSI 5389 (E-Commerce Technologies)
XML Processing • SAX (Simple API for XML): • SAX is an event-driven API, providing functions to be called whenever specific XML constructs are encountered during parsing. • It is used to transform/output as XML document is parsed. • DOM (Document Object Model): • DOM is also an API, focused on the data structure. • It provides functions that the client uses to traverse the structure of an XML document, and functions for creating and altering the in-memory structure of a new document. • XPATH (XML Path Language): • XPATH provides query syntax for addressing parts of an XML document (i.e., addressing nodes in the abstract syntax tree). • XSLT (Extensible Stylesheet Language Transformations): • XSLT provides rules to transform an XML document into other XML formats or into other formats (such as HTML). CSI 5389 (E-Commerce Technologies)
XML on the Web GUI DOM Browser HTML XSLT Client HTTP Server Parse and Process SAX Server DB CSI 5389 (E-Commerce Technologies)
Web Services CSI 5389 (E-Commerce Technologies)
A Simple Example • Web services are simply applications made accessible over the Web. • Consider a shipping rate calculator provided by a logistics company. Turning this calculator into a Web service requires the following steps: • Encapsulate the logic of the calculator (but not the user interface) into a subroutine. • Define the API for the calculator using the Web Services Definition Language (WSDL). • Host the subroutine on a Web server supporting the Simple Object Access Protocol (SOAP). • Publish the calculator definition to an appropriate UDDI (Universal Description, Discovery, and Integration) directory. CSI 5389 (E-Commerce Technologies)
A Simple Example (cont.) • Now, a programmer who wants to use the rate calculator from an e-commerce system can do the following: • Look up the service in the UDDI directory. • Use SOAP to make a remote call from the client application to the rate calculator. • Use the results of the call in the application. • Web services make it easy for service providers to make business logic available for remote use. CSI 5389 (E-Commerce Technologies)
A Simple Example (cont.) UDDI Registry Publish Service Lookup Service Internet Web Services Client Web Services Host SOAP Call SOAP Response CSI 5389 (E-Commerce Technologies)
The Vision of Web Services • Web services provide a straightforward and interoperable means for programs to communicate with each other over the Web. • Web services also provide directories so that providers can advertise and users can search for services. • It is possible to develop a market for heavyweight remote services, such as payment systems, logistics, business messaging etc. CSI 5389 (E-Commerce Technologies)
Remote Procedure Calls • Web services are built on the concept of remote procedure calls (RPC). • In an RPC, the calling program, rather than invoking a local subroutine, instead invokes a client stub, which has the same API as the desired subroutine. • The client stub communicates with a remote server, where a server stub makes the actual call to the actual subroutine. • In addition, the calling program must bind its interface to the appropriate server by using a network directory service. • The service directory is implemented using UDDI and the API is defined using WSDL, which is an XML schema. • Actual parameters and return values are encoded in text form in XML. • Web services are built on standard Web servers and HTTP. • Taken together, these decisions make use of the existing Internet infrastructure for communications between programs. CSI 5389 (E-Commerce Technologies)
SOAP • The Simple Object Access Protocol (SOAP) is the specification of how RPCs are implemented over the Web. • There are 3 aspects to SOAP: • The SOAP calling conventions explain how to represent calls to remote procedures and their responses. • The SOAP encoding rules explain how to represent application data, namely the arguments and return values from the remote procedure calls. • The SOAP envelope defines the contents of a SOAP message and the rules for processing it. • SOAP is almost always used with HTTP as the transport protocol, but it can also be used with other communications systems. CSI 5389 (E-Commerce Technologies)
WSDL • The Web Services Definition Language (WSDL) is the interface definition language for Web services. • Most commonly, WSDL is used to describe services that are available via SOAP and HTTP. • WSDL defines Web services in terms of the following six concepts: • Types: The data type definitions that are used to describe messages. • Message: An abstract definition of the data being transmitted. • Port Type: A set of abstract operations, each of which has input and output messages. • Binding: The concrete protocol and data format specifications • Port: An address for a single communication endpoint. • Service: The aggregation of a set of related ports. CSI 5389 (E-Commerce Technologies)
UDDI • Universal Description, Discovery, and Integration (UDDI) is not a protocol so much as a process. • The idea is to operate directories or registries of business entities, business services so that people and programs can find providers of the Web services needed. • See www.uddi.org for further information. CSI 5389 (E-Commerce Technologies)
References • Dr. Stan Matwin’s Lecture slides • Dr. Thomas Tran Slides • An Introduction to XML by Lars Marius Garshol (http://www.garshol.priv.no/download/text/xml-intro/index-en.html) • XML, Java, and the Future of the Web by Jon Bosak (http://www.ibiblio.org/pub/sun-info/standards/xml/why/xmlapps.htm) CSI 5389 (E-Commerce Technologies)