1 / 14

DMT Week 3

Leiden University. The university to discover. DMT Week 3. Adriaan van der Weel and Peter Verhaar. Leiden University. The university to discover. Where do we stand?. Leiden University. The university to discover. Principles of markup. HTML: Document instance (your CV) Stylesheet (css)

Download Presentation

DMT Week 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Leiden University. The university to discover. DMT Week 3 • Adriaan van der Weel and Peter Verhaar

  2. Leiden University. The university to discover. Where do we stand?

  3. Leiden University. The university to discover. Principles of markup • HTML: • Document instance (your CV) • Stylesheet (css) • Application • Document instance (your CV) • Stylesheet (css) • DTD/Schema • Add: Prologue (XML decl.; DTD)

  4. Leiden University. The university to discover. Text and markup

  5. Leiden University. The university to discover. Knowledge representation • Structure and content • Ontology • What knowable things exist • What are the relationships that hold between them • Tree diagram • The book has structure and content: chapters, paragraphs, footnotes, etc. • XML represents structure and content • Various ontologies - various DTDs

  6. Leiden University. The university to discover. XML Basics 1 • Elements <p>...</p> • Attributes <title type=play>...</title> • Entities • Character: &#xE8; = è • General entities, referencing: • Chunks of text defined elsewhere • Text or image files, etc. • E.g., <p>The &BTCP; aims to ... </p> • Well-formedness, validation • Prologue (XML decl.; DTD)

  7. Leiden University. The university to discover. XML Basics 2 • Open standard (cf de facto standard): • Publicly available • Royalty-free • Fully and publicly documented • NB: ‘Who owns your data?’ • (Lower) ASCII and Unicode: • Platform and software independent • Software independent • Device independent

  8. Leiden University. The university to discover. Open standards 1 • Open standards in a networking world • Why? • Which? E.g., Internet Protocol Suite: • Link layer (physical/data, e.g., ethernet) • Internet layer, facilitating transport, e.g., IP • Transport layer, e.g. TCP • Application layer, e.g., HTTP, SMTP, FTP

  9. Leiden University. The university to discover. Open standards 2 • E.g.: • File format: Pdf, txt • Programming language: PHP, Linux • Style language: CSS, XSLT • Markup metalanguage: SGML, XML • Markup language: DocBook, HTML, EAD, TEI

  10. Leiden University. The university to discover. TEI basics • Text Encoding Initiative, 1987 • Text exchange in the humanities • TEI is a DTD • TEI is a collection of DTD fragments or modules • Platform and software independent (ASCII); open standard; open source • Used in an XML application (diagram) • Document ‘instances’ should be validated against the TEI DTD

  11. Leiden University. The university to discover. TEI DTD • The TEI DTD is modular. We use: • <!DOCTYPE TEI PUBLIC "-//TEI P5//DTD Main Document Type//EN" "http://www.tei-c.org/release/xml/tei/schema/dtd//tei.dtd" [ • <!ENTITY % TEI.header "INCLUDE"> • <!ENTITY % TEI.core "INCLUDE"> • <!ENTITY % TEI.textstructure "INCLUDE"> • <!ENTITY % TEI.transcr "INCLUDE"> • <!ENTITY % TEI.linking "INCLUDE"> • <!ENTITY % TEI.namesdates "INCLUDE"> • ]> • http://www.tei-c.org/release/xml/tei/schema/dtd/

  12. Leiden University. The university to discover. Why this rigmarole? • Print (‘Order of the Book’): • Author’s brain > Book > reader’s brain • Instrument: typography • Digital (‘Digital Order’?): • Author’s brain > Computer > reader’s brain • Instrument: markup • For both typography(=form) and content • So: Need to make text intelligent

  13. Leiden University. The university to discover. Using the computer / UM • Author’s brain > Computer > reader’s brain • Vary output format (paper, pdf, html, mobile phone, etc.) • Exchange • Reuse • Search and select • Count • Change content (order) and form • Etcetera

  14. Leiden University. The university to discover. New research questions? • Chris Anderson (The Long Tail), in Wired ‘The end of theory’ • But: need for hypothesis remains • But: humanities data: • Quantity: not such a wealth of data. Bitty. Discontinuous. • Quality: narrative, evaluative, ambiguous, subjective, conceptual • Who decides the agenda? Need to lead, rather than follow.

More Related