260 likes | 575 Views
What is an annotated" text ?. Ordinary Text : (Eg.)This is an ordinary text document.. Annotated text : (Eg.)<html><title>Sample Document </title><body>This is an annotated text document.</body></html>. Key Methods. DTDSGMLHTMLXMLWMLTEI. It is a specification that accompanies an annotate
E N D
1. Text Annotation Techniques Bill Bruno
Rob LaPlaca
2. What is an “annotated” text ?
3. Key Methods DTD
SGML
HTML
XML
WML
TEI
4. Document Type Definition
5. Standard Generalized Markup Language
6. HTML Hyper Text Markup Language
Symbols used to web pages.
Markup tells web browser how to display pictures and text.
Markups are called elements. Some elements come in pairs.
7. Basic Annotations in HTML Document Tags
HTML, HEAD, BODY
Basic Text Structures
Headings, Paragraphs, etc.
Anchors
HREF and Name
Images
IMG, ALIGN, ALT
8. Sample HTML Code <html>
<title> Sample Document
</title>
<body>
<p> This is a sample HTML document.</p>
<p>It illustrates the usage of tags with the actual text.</p>
</body>
</html>
9. HTML Specifics There are programs for it, but Word can be used to view it.
Not case sensitive.
Standardized code, can be viewed with different browsers.
10. XML Extensible Markup Language
It is a flexible way to create common information formats and share both the format and the data on the World Wide Web, intranets, and elsewhere.
11. XML An element of XML is a start tag, an end tag and data between.
<director>Ed Wood</director>
Attributes may also be assigned to element by tags.
<director=“Hollywood”>Ed Wood</director>
XML tags are case sensitive.
12. Sample XML Code <?xml version="1.0"?>
<doc>
<burns>Say<quote>goodnight</quote>,
Gracie.</burns>
<allen><quote>Goodnight,
Gracie.</quote></allen>
<applause/>
</doc>
13. Sample XML Code 1: <?xml version="1.0"?>
2: <!DOCTYPE PARENT [
3: <!ELEMENT PARENT (CHILD*)>
4: <!ELEMENT CHILD (MARK?,NAME+)>
5: <!ELEMENT MARK EMPTY>
6: <!ELEMENT NAME (LASTNAME+,FIRSTNAME+)*>
7: <!ELEMENT LASTNAME (#PCDATA)>
8: <!ELEMENT FIRSTNAME (#PCDATA)>
9: <!ATTLIST MARK
NUMBER ID #REQUIRED
LISTED CDATA #FIXED "yes"
TYPE (natural|adopted) "natural">
10 : <!ENTITY STATEMENT "This is well-formed XML">
11 : ]>
14. Sample XML Code <PARENT>
&STATEMENT;
<CHILD>
<MARK NUMBER="1" LISTED="yes" TYPE="natural"/>
<NAME>
<LASTNAME>child</LASTNAME>
<FIRSTNAME>second</FIRSTNAME>
</NAME>
</CHILD>
</PARENT>
15. Differences Between HTML and XML XML contains tags that describe the data
<phoneno> may describe a telephone number.
Supports links to multiple documents.
A forgotten tag in an XML program makes file unusable unlike HTML where it may be bypassed.
16. Benefits of XML Meaningful markup.
Single approach can accommodate document and data structures and integrates both within documents.
Enables transfer of data between applications.
Structural similarity to HTML simplifies implementation using traditional web servers/ browser applications CGI and java.
17. Benefits of XML Files can be processed purely as data - enabling it to be stored or displayed.
Files are text & verbose - allows easy debugging
It’s license-free, platform independent & well supported.
19. WML Wireless Markup Language.
Allow text portions of web pages to be viewed on cellphones and PDAs.
Part of the Wireless Application Protocol.
Used to be called HDML
Handheld Devices Markup Language.
20. WML Read in browsers, similar to HTML and XML.
WAP devices use a micro browser. Like a regular web browser, but with limited features.
HTML could be used, but WML is better for smaller bandwidth.
WML uses lesser power to process compared to HTML.
21. Text Encoding Initiative
22. Need for a common encoding scheme Till the TEI project was undertaken there has not been any common encoding format for scholarly machine-readable texts.
None of the existing encoding schemes has been able to gain acceptance as a standard.
23. Origin of TEI & factors contributing to it TEI arose out of a planning conference convened by ACH at Vassar College, Poughkeepsie, New York in November 1987
Factor I : More is known now about the problems of text encoding than at the time of previous attempts
Factor II : The recently developed Standard Generalized Markup Language (SGML) seemed to be the ideal text-encoding scheme.
24. Objectives of TEI
25. Why TEI chose SGML ?
26. Critique Straightforward presentation
More examples would be helpful
Research required to fully understand some points