1 / 31

Text Annotation Techniques

Annotated Text. Ordinary Text : (Eg.)This is an ordinary text document.. Annotated text : (Eg.)<html><title> Sample Document</title><body> This is an annotated text document.</body></html>. Document Type Definition (DTD). It is a specification that accompanies an annotated document. It ai

ely
Download Presentation

Text Annotation Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Text Annotation Techniques by Brian Wanner

    2. Annotated Text

    3. Document Type Definition (DTD) It is a specification that accompanies an annotated document. It aids the parser in identifying what the codes (or markup) are that separate paragraphs, identify topic headings It also indicates to the parser how each tag is to be processed. The DTD for every document is generally placed on top of the document.

    4. DTD Example This is an XML document with a Document Type Definition PCData - Parsed Character Data PCDATA is text that will be parsed by a parser. Tags inside the text will be treated as markup and entities will be expanded. PCData - Parsed Character Data PCDATA is text that will be parsed by a parser. Tags inside the text will be treated as markup and entities will be expanded. 

    5. Why use DTD? each file can carry a description of its own format with it. independent groups of people can agree to use a common DTD for interchanging data. Your application can use a standard DTD to verify that the data you receive from the outside world is valid.

    6. Standard Generalized Markup Language (SGML) SGML is a metalanguage a language for writing languages in. SGML is used to define the abstract structure of a DTD

    7. SGML Each markup language defined in SGML is called an SGML application. An SGML application is generally characterized by: An SGML declaration. The SGML declaration specifies which characters and delimiters may appear in the application. A DTD A specification that describes the semantics to be ascribed to the markup. This specification also imposes syntax restrictions that cannot be expressed within the DTD. Document instances containing data (content) and markup. Each instance contains a reference to the DTD to be used to interpret it.

    10. Seen from a DTD point of view, all XML documents (and HTML documents) are made up by the following simple building blocks: Elements Tags Attributes Entities PCDATA CDATA

    11. Elements Elements are the main building blocks of both XML and HTML documents. Examples of HTML elements are "body" and "table". Examples of XML elements could be "note" and "message". Elements can contain text, other elements, or be empty. Examples of empty HTML elements are "hr", "br" and "img".

    12. Tags Tags are used to markup elements. A starting tag like <element_name> marks up the beginning of an element, and an ending tag like </element_name>  marks up the end of an element. Examples: body element marked up with body tags: <body>body text in between</body>. message element marked up with message tags: <message>some message in between</message>

    13. Attributes Attributes provide extra information about elements. Attributes are always placed inside the starting tag of an element. Attributes always come in name/value pairs. The following "img" element has additional information about a source file: <img src="computer.gif" /> The name of the element is "img". The name of the attribute is "src". The value of the attribute is "computer.gif". Since the element itself is empty it is closed by a " /".

    14. Entities Entities are variables used to define common text. Entity references are references to entities.

    15. PCDATA PCDATA means parsed character data. Think of character data as the text found between the start tag and the end tag of an XML element. PCDATA is text that will be parsed by a parser. Tags inside the text will be treated as markup and entities will be expanded. 

    16. CDATA CDATA also means character data. CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded.

    17. What is HTML? HTML is a non-proprietary format based upon SGML, and can be created and processed by a wide range of tools, from simple plain text editors - you type it in from scratch- to sophisticated WYSIWYG authoring tools.

    19. <H1>Heading 1</H1> <H2>Heading 2</H2> <H3>Heading 3</H3> <H4>Heading 4</H4>

    20. What is XML? Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML. Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere.

    21. XML Sample <?xml version="1.0"?> <order orderid="THX1138“ customerNumber="3263827"> <lineitem itemid="C33"> <quantity>36</quantity> <unitprice currency="dollars">.35</unitprice> </lineitem> <lineitem itemid="M48"> <quantity>1</quantity> <unitprice currency="dollars">2200</unitprice> </lineitem> </order>

    22. XML/HTML XML is not a replacement for HTML. XML and HTML were designed with different goals: XML was designed to describe data and to focus on what data is. HTML was designed to display data and to focus on how data looks. HTML is about displaying information, XML is about describing information.

    23. XML/HTML The tags used to markup HTML documents and the structure of HTML documents are predefined. The author of HTML documents can only use tags that are defined in the HTML standard. XML allows the author to define his own tags and his own document structure.

    24. What is XHTML? The Extensible HyperText Markup Language (XHTML) is a family of current and future document types and modules that reproduce, subset, and extend HTML, reformulated in XML. XHTML Family document types are all XML-based, and ultimately are designed to work in conjunction with XML-based user agents. XHTML is the successor of HTML, and a series of specifications has been developed for XHTML.

    25. SMIL SMIL authoring offers a new way to assemble and deliver streaming multimedia presentations. Rather than the traditional way of creating a presentation by compiling a set of media into a single distributable file, SMIL lets authors choreograph separate media assets quickly and easily, with tools as simple as a text editor. Perhaps the best feature of SMIL is the ability to generate the code on-the-fly, as many Web pages are already created, and thereby offer personalized streaming multimedia. SMIL Demo SMIL Source

    26. SVG SVG is a language for describing two-dimensional graphics in XML SVG allows for three types of graphic objects vector graphic shapes (e.g., paths consisting of straight lines and curves) Images text

    27. SVG Sample <symbol id="whiteYellowBezier" overflow="visible"> <path style="stroke:black;fill:none;" d="M 0,0 C 0.25,-0.1 0.75,-0.1 1,0"> <animate id="whiteYellowBezierAnim" attributeName="d" values="M 0,0 C 0.25,- 0.1 0.75,-0.1 1,0; M 0,0 C25,-10 75,-10 100,0" dur="5s" repeatCount="3"/> <animate attributeName="stroke-width" values="1;3" dur="5s" repeatCount="3" /> <animate attributeName="stroke" values="white;yellow" dur="5s" repeatCount="3" /> </path> </symbol>

    29. WML an annotation technique that allows the text portions of Web pages to be presented on cellular telephone and personal digital assistants (personal digital assistant) via wireless access. Though HTML can be used WML is used as it has lesser bandwidth resources. Also WML uses lesser power to process compared to HTML.

    30. TEI (Text Encoding Initiative) an international project to develop guidelines for the preparation and interchange of electronic texts for scholarly research. Supported and promoted the use of SGML. Future Direction

    31. W3C (World Wide Web Consortium) Vision: Contributions from several hundred dedicated researchers and engineers working for Member organizations, from the W3C Team , and from the entire Web community enable W3C to identify the technical requirements that must be satisfied if the Web is to be a truly universal information space. Design: W3C designs Web technologies to realize this vision, taking into account existing technologies as well as those of the future. Standardization: W3C contributes to efforts to standardize Web technologies by producing specifications (called "Recommendations") that describe the building blocks of the Web. W3C makes these Recommendations freely available to all.

    32. Paper Critique

More Related