230 likes | 328 Views
XML for Information Management. 12.1.-16.1. 2009. University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen http://users.jyu.fi/~airi/. Day 2: Background of XML. Outline. 1. Markup languages 2. Structured documents 3. World Wide Web Consortium.
E N D
XML for Information Management 12.1.-16.1. 2009 University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen http://users.jyu.fi/~airi/
Day 2: Background of XML Outline 1. Markup languages 2. Structured documents 3. World Wide Web Consortium
1. Markup languages Markup • intended for human readers • intended for computers
1. Markup languages Markup for human readers • punctuational • presentational to clarify the written expression Texthasalwaysincludedsomekindofmarkupalsobeforethetimeofcomputers Text has always included some kind of markup, also before the time of computers. Text has always included some kind of markup, also before the time of computers.
1. Markup languages Markup for computers • presentational • procedural • descriptive to provide information for a software module • In markup languages clear separation of markup and primary content.Markup is metadata, adding some information to the primary data.
1. Markup languages Presentational markup information about the way the software module should present the primary content to the human perceiver In <i>markup languages</i> there is clear separation of <i>markup</i> and <i>primary content</i>. Markup is <i>metadata</i>, adding some information to the primary data. The tags <i> and </i> represent presentational markup in HTML. The markup in an HTML file
1. Markup languages Procedural markup a processing instruction for the software module <![CDATA[<element>Example of an XML element</element>]]> <![CDATA[<element>Example of an XML element</element>]]> The strings <![CDATA[ and ]]> represent procedural markup in XML. <![CDATA[ instructs the XML processor to regard all text before ]]> as character data ]]>instructs the XML processor to to continue normal identification of markup The markup in an XML file
1. Markup languages Declarative markup describes the content of a piece of primary content, what it is, or declares that the piece is a member of a particular class <student> <first_name>Steve</first_name> <last_name>Chung</last_name> <email>steve.chung@university.ca</email> </student> The markup in an XML file XML is primarily for declarative markup.
1. Markup languages Markup in XML • All markup delivers information to XML Processor. DTD represents metamarkup, facilitating the definition of the markup vocabulary. • Markup in an XML document is usually classified in respect to the application. • Processing instructions represent procedural markup. • Element tags represent declarative markup. • In the specification of an XML application different kinds of meanings can be given to element names, they can be processing instructions to the application or instructions about the way the content should be presented by the application.
1. Markup languages Example of HTML markup <html> <head> <title>University ofJyväskylä</title> </head> <body> <h2>Faculties</h2> <ul> <li>Humanities <li>Information Technology <li>Social Sciences </ul> <br> <address>admin@jyu.fi</address> </body> </html> The element markup describes the structure for WWW publishing.
1. Markup languages The same primary content with markup describing the content of elements by means of XML markup. <university> <name>University ofJyväskylä</name> <faculties>Faculties <faculty>Humanities</faculty> <faculty>Information Technology</faculty> <faculty>Social Sciences</faculty> </faculties> <contact_email>admin@jyu.fi</contact_email> </university>
1. Markup languages Logical structure of the HTML document Logical structure of the XML document University of Jyväskylä University of Jyväskylä title Faculties Faculties name Humanities head h2 Humanities faculty li university html Information Technology ul li faculties Information Technology Social Sciences body faculty li Social Sciences br contact_email faculty admin@cjyu.fi admin@cjyu.fi address
2. Structured documents Structured document • structure, content, and external presentation can be separated from each other and processed separately • structural components have names • structural components can be recognized by software modules • possible to define the structure
2. Structured documents Structure Content Layout Structured document different languages for defining the structure, e.g., DTD, XML Schema, RELAX NG for XML an open language standard, e.g. SGML, XML different languages for defining the layout, e.g., CSS and XSL for XML
2. Structured documents Structure Content Layout Structured document Example DTD.txt rhymes.txt rhymes.xml style.txt style.css rhymes with style attachment.txt rhymes with style attachment.xml
2. Structured documents Management of structured documents • document management • management of the data contained in documents
2. Structured documents Characteristics in the management of structured documents • Design.Adopting the approach of structured document management in an environment often requires careful planning before the creation of documents. Includes schema design and layout design. • Content production. Content can be produced by different types of software, e.g. by a syntax-directed editor. Checking the validity against the schema. • Evolution. Schema versioning, layout versioning. • Operations. Most typical operation is some kind of transformation. • Software. Many kinds of software systems used.
2. Structured documents Database languages • definition languages • query languages Structured document languages • definition languages • style languages • various manipulation, transformation and query languages
3. World Wide Web Consortium • W3C developes specifications to support the use of the web, publicly available at http://www.w3.org/TR/ • Development is systematic • Development process is specified and published
3. World Wide Web Consortium Phases of the development process • Working Draft: represents work in progress. • Candidate Recommendation: has received significant review from its immediate technical community, explicit call for implementation and technical feedback. • Proposed Recommendation: represents consensus in the development group, proposed to the Advisory Committee for review. • Recommendation: represents consensus within W3C, widespread implementation encouraged.
3. World Wide Web Consortium What happens to a W3C Recommendation? • Remains as a Recommendation indefinitely. • W3C rescinds the recommendation. A report called Rescinded Recommendation is published. • A new version of the Recommendation is developed. • Minor modifications are done. A report called Proposed Edited Recommendation is published.