310 likes | 497 Views
Annotated Text. Ordinary Text : (Eg.)This is an ordinary text document.. Annotated text : (Eg.)<html><title> Sample Document</title><body> This is an annotated text document.</body></html>. Document Type Definition (DTD). It is a specification that accompanies an annotated document. It ai
E N D
1. Text Annotation Techniques by Brian Wanner
2. Annotated Text
3. Document Type Definition (DTD) It is a specification that accompanies an annotated document.
It aids the parser in identifying what the codes (or markup) are that separate paragraphs, identify topic headings
It also indicates to the parser how each tag is to be processed.
The DTD for every document is generally placed on top of the document.
4. DTD Example This is an XML document with a Document Type Definition PCData - Parsed Character Data
PCDATA is text that will be parsed by a parser. Tags inside the text will be treated as markup and entities will be expanded. PCData - Parsed Character Data
PCDATA is text that will be parsed by a parser. Tags inside the text will be treated as markup and entities will be expanded.
5. Why use DTD? each file can carry a description of its own format with it.
independent groups of people can agree to use a common DTD for interchanging data.
Your application can use a standard DTD to verify that the data you receive from the outside world is valid.
6. Standard Generalized Markup Language (SGML)
SGML is a metalanguage
a language for writing languages in.
SGML is used to define the abstract structure of a DTD
7. SGML Each markup language defined in SGML is called an SGML application. An SGML application is generally characterized by:
An SGML declaration.
The SGML declaration specifies which characters and delimiters may appear in the application.
A DTD
A specification that describes the semantics to be ascribed to the markup. This specification also imposes syntax restrictions that cannot be expressed within the DTD.
Document instances containing data (content) and markup. Each instance contains a reference to the DTD to be used to interpret it.
10.
Seen from a DTD point of view, all XML documents (and HTML documents) are made up by the following simple building blocks:
Elements
Tags
Attributes
Entities
PCDATA
CDATA
11. Elements Elements are the main building blocks of both XML and HTML documents.
Examples of HTML elements are "body" and "table". Examples of XML elements could be "note" and "message". Elements can contain text, other elements, or be empty. Examples of empty HTML elements are "hr", "br" and "img".
12. Tags Tags are used to markup elements.
A starting tag like <element_name> marks up the beginning of an element, and an ending tag like </element_name> marks up the end of an element.
Examples:
body element marked up with body tags:
<body>body text in between</body>.
message element marked up with message tags:
<message>some message in between</message>
13. Attributes Attributes provide extra information about elements.
Attributes are always placed inside the starting tag of an element. Attributes always come in name/value pairs. The following "img" element has additional information about a source file:
<img src="computer.gif" />
The name of the element is "img". The name of the attribute is "src". The value of the attribute is "computer.gif". Since the element itself is empty it is closed by a " /".
14. Entities Entities are variables used to define common text.
Entity references are references to entities.
15. PCDATA PCDATA means parsed character data.
Think of character data as the text found between the start tag and the end tag of an XML element.
PCDATA is text that will be parsed by a parser. Tags inside the text will be treated as markup and entities will be expanded.
16. CDATA CDATA also means character data.
CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded.
17. What is HTML? HTML is a non-proprietary format based upon SGML, and can be created and processed by a wide range of tools, from simple plain text editors - you type it in from scratch- to sophisticated WYSIWYG authoring tools.
19. <H1>Heading 1</H1> <H2>Heading 2</H2> <H3>Heading 3</H3> <H4>Heading 4</H4>
20. What is XML? Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML.
Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere.
21. XML Sample <?xml version="1.0"?>
<order orderid="THX1138“ customerNumber="3263827">
<lineitem itemid="C33">
<quantity>36</quantity>
<unitprice currency="dollars">.35</unitprice>
</lineitem>
<lineitem itemid="M48">
<quantity>1</quantity>
<unitprice currency="dollars">2200</unitprice>
</lineitem>
</order>
22. XML/HTML XML is not a replacement for HTML.XML and HTML were designed with different goals:
XML was designed to describe data and to focus on what data is.HTML was designed to display data and to focus on how data looks.
HTML is about displaying information, XML is about describing information.
23. XML/HTML
The tags used to markup HTML documents and the structure of HTML documents are predefined. The author of HTML documents can only use tags that are defined in the HTML standard.
XML allows the author to define his own tags and his own document structure.
24. What is XHTML? The Extensible HyperText Markup Language (XHTML) is a family of current and future document types and modules that reproduce, subset, and extend HTML, reformulated in XML.
XHTML Family document types are all XML-based, and ultimately are designed to work in conjunction with XML-based user agents.
XHTML is the successor of HTML, and a series of specifications has been developed for XHTML.
25. SMIL SMIL authoring offers a new way to assemble and deliver streaming multimedia presentations. Rather than the traditional way of creating a presentation by compiling a set of media into a single distributable file, SMIL lets authors choreograph separate media assets quickly and easily, with tools as simple as a text editor. Perhaps the best feature of SMIL is the ability to generate the code on-the-fly, as many Web pages are already created, and thereby offer personalized streaming multimedia.
SMIL Demo
SMIL Source
26. SVG SVG is a language for describing two-dimensional graphics in XML
SVG allows for three types of graphic objects
vector graphic shapes (e.g., paths consisting of straight lines and curves)
Images
text
27. SVG Sample <symbol id="whiteYellowBezier" overflow="visible">
<path style="stroke:black;fill:none;" d="M 0,0 C 0.25,-0.1 0.75,-0.1 1,0">
<animate id="whiteYellowBezierAnim" attributeName="d" values="M 0,0 C 0.25,- 0.1 0.75,-0.1 1,0; M 0,0 C25,-10 75,-10 100,0" dur="5s" repeatCount="3"/>
<animate attributeName="stroke-width" values="1;3" dur="5s" repeatCount="3" />
<animate attributeName="stroke" values="white;yellow" dur="5s" repeatCount="3" />
</path>
</symbol>
29. WML an annotation technique that allows the text portions of Web pages to be presented on cellular telephone and personal digital assistants (personal digital assistant) via wireless access.
Though HTML can be used WML is used as it has lesser bandwidth resources.
Also WML uses lesser power to process compared to HTML.
30. TEI (Text Encoding Initiative)
an international project to develop guidelines for the preparation and interchange of electronic texts for scholarly research.
Supported and promoted the use of SGML. Future Direction
31. W3C (World Wide Web Consortium)
Vision: Contributions from several hundred dedicated researchers and engineers working for Member organizations, from the W3C Team , and from the entire Web community enable W3C to identify the technical requirements that must be satisfied if the Web is to be a truly universal information space.
Design: W3C designs Web technologies to realize this vision, taking into account existing technologies as well as those of the future.
Standardization: W3C contributes to efforts to standardize Web technologies by producing specifications (called "Recommendations") that describe the building blocks of the Web. W3C makes these Recommendations freely available to all.
32. Paper Critique