280 likes | 395 Views
EE557: Server-Side Development. Lecturer: David Molloy Time: Mondays 10am-1pm Notes: http://www.eeng.dcu.ie/~ee557 Mailing List: ee557@list.dcu.ie List URL: http://list.dcu.ie/mailman/listinfo.cgi/ee557. EE557: Server-Side Development. XML – eXtensible Markup Language.
E N D
EE557: Server-Side Development Lecturer: David Molloy Time: Mondays 10am-1pm Notes: http://www.eeng.dcu.ie/~ee557 Mailing List: ee557@list.dcu.ie List URL: http://list.dcu.ie/mailman/listinfo.cgi/ee557
EE557: Server-Side Development XML – eXtensible Markup Language • Invented to have a standard and powerful way to describe ANY data! • XML information can be exchanged across platforms, languages and • applications • HTML and XML both related to Standard Generalized ML (SGML) • SGML very complex, limited mass appeal, handling large documents • What’s wrong with HTML? • - Nothing for its role … presentation • - HTML consists mostly of tags defining the appearance of text • CSS helps with seperation but is a half-way measure • - XML is extensible – tags can be defined by individuals or • organisations for some specific application. HTML limited to • tags defined within W3C standards • - HTML and XML actually complementary technologies • - XML better solution for structuring and/or sharing data
EE557: Server-Side Development XML – eXtensible Markup Language • Consider a company wishing to store personnel information on staff • How should they store this information? • Flat Text Files – no structure, portable, poor search capabilities, limited to • storing text data, platform independence, average interaction with IS systems • Word/Excel/Access – no platform independence, no portability, applic. • specific, limited search capabilities, poor interaction with IS systems, can store • images and some multimedia formats • Databases – platform and vendor specific (to some extent), average portability • of data, application specific, good searching capabilities, good synchronisation • and mutual exclusion protection, often limited to storing binary data in binary • objects. • XML – platform independent, excellent portability, application independent, • Excellent searching capabilities, syncronisation/mutual exclusion? (code), can • Store any type of data, excellent for interaction between IS systems
EE557: Server-Side Development XML vs Databases • What about databases? • Relational databases process data independently of its context • Well suited for data that fits easily into rows and columns • Not very suited for handling multimedia content and rich data such as • audio, video, nested data structures and complex documents (BLOBS!) • Databases often mimic XML storage by translating between XML and • some other proprietary data format. • This typically achieved via external conversion layers that handle • the storage conversion • For example, some databases provide the facility to convert objects • to database types automatically (or you can manually do it!) • XML most commonly used to represent object oriented data for the • transfer of information – less commonly used for mainstream storage
EE557: Server-Side Development Sample XML Document Snippet <employee> <ident>3348498</ident> <name> <lastname>Peterson</lastname> <firstname>Sam</firstname> <title>Dr.</title> </name> <phonedetails> <extension>8221</extension> <companyprefix>700</companyprefix> <regionprefix>1</regionprefix> <intprefix>+353</intprefix> </phonedetails> <department> <title>Software Development</title> <depid>8</depid> </department> <location> <building>Aston Quay</building> <room>A142</room> </location> </employee> • Two different types of info: • markup • character data
EE557: Server-Side Development XML Features • Simplicity – information in XML easy to read and understand, makes sense, • easily processed by computers • Self-Describing – unlike databases, XML data does not require relational • schema, external data type definitions etc. because the data itself contains this • information • Open and Extensible – you can add other elements when needed, to your • own specification, when needed • Application Independence – Using XML, data is no longer dependent on a • specific application for creation, viewing and editing. -> XML is to data what • Java is to applications. Java = run anywhere, XML = data anywhere • Data Format Integration – XML documents can contain any imaginable data • type • One Data Source, multiple views – we allow computer applications to • process and display our data in different ways. HTML just one way!
EE557: Server-Side Development XML Features • Data Presentation Modification – similar to style sheets, you can change the • look and feel of documents or entire websites by using XSL style sheets, without • manipulating the data itself. • Internationalization – XML supports multilingual documents • Future-Oriented – XML is the endorsed industry standard of the W3C and is • supported by all leading software providers. Standard in industries (eg. Health) • Improved Data Searches – by searching both data and meta data we • drastically improve the accuracy of searches, as well as the power. HTML very • limited to word searches. • Enables e-Commerce Transactions – each transaction actually formed of • multiple transactions between a host of agents • - B2B e-Commerce • - B2C e-Commerce • - enterprise integration (between local systems) • Previously integrated using protocols such as CORBA • XML provides the option of integrating using standardized data – more again!
EE557: Server-Side Development XML Document Structure • XML Document made up of two sections: • - Prolog Section • - Instance Section Prolog Section • <?xml version=“1.0”?> • <!DOCTYPE book SYSTEM “DTD/book.dtd”> • Document Type Definition (DTD) – sets all the rules for the document • regarding elements, attributes and other components • - Internal DTD: contained completely within the XML document • - External DTD: in a separate document, referenced from XML doc • Typically use the SYSTEM keyword for a relative or absolute file path • Alternatively can use the PUBLIC keyword for a W3C or other • consortium defined standard • Eg. <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” • “http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”>
EE557: Server-Side Development XML Document Structure Instance Section • Consists of the actual content of the document – everything! • Principally: • Elements such as <name>David</name> • Attributes such as <image src=“images/test.jpg” /> • Entities • Unparsed Data • -> All described subsequently!
EE557: Server-Side Development XML Document Structure ELEMENTS Elements are the most important part of an XML document. An element Consists of an opening tag and a closing tag. Multiple types of content: - Element Content: only contains other elements Eg. <name> in <name><fn>David</fn><ln>Molloy</ln></name> - Mixed Content: contains both text and other elements Eg. <para> in <para>This is <emphasis>very important!</emphasis></para> - Simple Content: contains just text Eg. <lastname>Molloy</lastname> - Empty Content: Does not contain information Eg. <image src=“test.jpg”></image>
EE557: Server-Side Development XML Document Structure ELEMENTS • XML elements are case-sensitive • XML elements always require a begin and an end tag (unlike HTML) • Shorthand for some elements, such as empty elements • Eg. <image src=“test.jpg”></image> • Can be written as: <image src=“test.jpg” /> • XML Documents must be ‘well-formed’ • <tag1><tag2></tag1></tag2> = not well formed! • Well-formed documents are not necessarily ‘valid’ XML. Valid XML must • also follow the constraints set upon it by its DTD or schema • XML Documents can only have a single root element • Root element can have further subelements, and so on
EE557: Server-Side Development XML Document Structure ELEMENTS
EE557: Server-Side Development XML Document Structure ATTRIBUTES • In addition to content, elements can have attributes • XML attributes are similar to HTML attributes, allowing you to attach • characteristics to elements • Example: <image src=“images/test.jpg” /> • Attributes have a name and value and are placed within the start tag • Legal attributes for elements can be defined within the DTD • Quotes must be used, either single or double • <topic name=“Brian O’Sullivan” /> • or • <topic name=‘The Use of “s” in Popular Literature’></topic>
EE557: Server-Side Development XML Document Structure ATTRIBUTES • When to use attributes, when to use elements <phone number="+35318008583" /> <phone> <intcode>+353</intcode> <localcode>1</localcode> <prefix>800</prefix> <extension>8583</extension> </phone> • No specification or requirement, can often use either • Rule of thumb: “If data can have multiple values, or is very lengthy, • the data most likely belongs in an element” • As in the example above, if you wish to access subdata, element better
EE557: Server-Side Development XML Document Structure ENTITIES • How do we write <HTML> in our XML Document describing how to write • web pages? What can happen? Closing tag? • Entity References provide a way to overcome this problem • Entity Reference is a special data type in XML used to refer to another • piece of data • Format: &[entityname]; • When XML parser sees an entity reference, the specified substitution • value is inserted and no processing of that value occurs • Five standard pre-defined entities: < > & " and • ' • -> <HTML> solves our issue!
EE557: Server-Side Development XML Document Structure ENTITIES • Entities not simply restricted to handling escape characters within data • Can use them to define variable or constant values within our document • <!ENTITY rspca “Royal Society for the Prevention of Cruelty to • Animals (RSPCA)”> • Can subsequently just use &rspca; to represent our constant • Suitable for variable content also – can define an entity to represent • the author’s email at one centralised point in the XML document. If • changed later, the change will propagate throughout the document • Entities can be entire files (notes example) • <!ENTITY servlets SYSTEM “servlets.xml”> • &servlets;
EE557: Server-Side Development XML Document Structure Unparsed Data • In XML there are 3 kinds of data that are ignored by the parser • Comments : in XML are exactly like those in HTML • <!-- this is a comment --> • Character Data (CDATA) : allows you to put information that might be • recognised as markup anywhere characters might occur. CDATA sections • begin with <![CDATA[ and end with ]]>. Parser ignores everything within • these tags. Suitable for programlistings or where spacing should be • preserved • <programlisting> • <![CDATA[ <HTML> <HEAD> <TITLE>Test HTML Page</TITLE> </HEAD> • <BODY> <H1>Hello World!</H1> </BODY> • </HTML> ]]> • </programlisting> • Processing Instructions (PIs): allow documents to contain instructions • for applications. The PI begins with <? and ends with ?>
EE557: Server-Side Development Document Type Definitions (DTDs) • Why constrain our XML documents? XML is extensible with hundreds • and thousands of ways to represent data. DTD helps us define what • the data in our XML document means • DTDs preserve the XML document structure • DTDs declare the legal elements and attributes in a document, their • nesting and occurance indicators • DTD just simply a text file with declarations for the elements and • attributes. Can be internal or external DTD • External preferable – DTD reuse across multiple XML documents
EE557: Server-Side Development Document Type Definitions (DTDs) Element Declarations • <!ELEMENT Author (Name)+> • + is the occurrence indicator • + = must appear at least once, up to infinite (1….N) • ? = may appear once or not at all (0,1) • [Default] = must appear once and only once (1) • * = may appear any number of times, including 0 (0….N) • Element can have multiple subelements • <!ELEMENT Name (Firstname, Lastname, Title?)> • #PCDATA signifies that the tag contains parsed character data. The • parser only expects character data and not further tags • Can use the | symbol as an OR operator • <!ELEMENT Figure (Graphic | Table | Screen-shot)>
EE557: Server-Side Development Document Type Definitions (DTDs) <!ELEMENT Author (Name+)> <!ELEMENT Name (Firstname, Lastname, Qualification*)> <!ELEMENT Qualification (#PCDATA)> <!ELEMENT Firstname (#PCDATA)> <!ELEMENT Lastname (#PCDATA)> -> Some valid XML would be: <Author> <Name> <Firstname>Joe</Firstname> <Lastname>Smith</Lastname> <Qualfication>B.Eng. PhD MIEEE</Qualification> </Name> <Name> <Firstname>Mary</Firstname> <Lastname>Jones</Lastname> </Name> </Author>
EE557: Server-Side Development Document Type Definitions (DTDs) Attribute Declarations • Attributes contain additional data about an element • <!ATTLIST elemName • attName attType default-decl> • <!ATTLIST chapter • title CDATA #REQUIRED • number CDATA #REQUIRED > • <chapter title=“Test Title” number=“4” /> • Can also specify a set of values the attribute must take on for validity: • <!ATTLIST code • type (Java | C | C++) #REQUIRED> • #REQUIRED=attribute must be present, #IMPLIED=optional attribute, • #FIXED=provides a default values that cannot be modified by author
EE557: Server-Side Development Document Type Definitions (DTDs) Attribute Declarations • Alternatively can provide a default value for an attribute that will be set • if the document author does not override it • <!ATTLIST country name (United States|Canada|Other) “Other”>
EE557: Server-Side Development Document Type Definitions (DTDs) Attribute Declarations • Example: Consider the following statements • <!ATTLIST person gender CDATA #DEFAULT "male"> • <!ATTLIST person gender CDATA #FIXED "male"> • <!ATTLIST person gender CDATA #REQUIRED> • <!ATTLIST person gender CDATA #IMPLIED> • <!ATTLIST person gender (male|female) "male"> • <person gender="male"> Satisfies all of the above • <person gender="female"> Violates 2nd Definition above • <person gender="unknown"> Violates 5th and 2nd Definitions above
EE557: Server-Side Development XML Schemas • A schema is a model that describes the structure of information • XML Schema is a recently finalized candidate recommendation from the • W3C • Seeks to improve on DTDs by adding more types and following the • XML format • DTDs limited for describing other data types, such as numbers, dates • and currency values - only character data • Provide extra functionality over DTDs but with added complexity • We cover DTDs to demonstrate the concept in a quickly understandable • way
EE557: Server-Side Development Sample Question 1 • We wish to store data on customers of a bank. Each customer has the following structure: • Customer Name (Firstname, Lastname, Title) (required) • Account Type (Cashsave, SSIA, Current, Savings) (1 or more accounts) • The balance within the account (required) • Date of Account Creation (optional) • Generate an XML document which could represent this information (make two sample customers) and a corresponding DTD against which you should validate the document.
EE557: Server-Side Development Sample Question 1
EE557: Server-Side Development Sample Question 2 • Your boss, within the University in which you work, has informed you that he wishes you to manage authentication for the new Virtual Learning System, which is being developed by a team of programmers. He tells you that he wants all user details to be held in one XML file called users.xml, which should be validated by a corresponding DTD file called users.dtd. The following data structures need to be followed: • Users of the system can either be students or lecturers • Data on either type to be maintained: firstname, surname, title(optional), username, email address, full address details • If a student exists, they must be registered on either one programme (store the programme name, a four-digit code and a one-digit year and whether the programme is being active, ie. not deferred) or no programme at all • Both students and lecturers, can be registered for 0 or more modules (store semester 1 or 2 and module id as a five digit code) • Lecturers must have a staff id • Generate an XML file containing data for at least two users (one lecturer, one student) and create a corresponding and 'clever' document type definition (DTD) file. Use both elements and attributes to represent the data.
EE557: Server-Side Development Sample Question 2