350 likes | 444 Views
Linkable data, linked data and texts What have the Digital Humanities to offer, based on the CIDOC-CRM and TEI. Christian-Emil Ore University of Oslo NTNU 02.11.2018. Agenda. Introduction Text encoding Conceptual Modelling An example – Mediaeval texts and Linkable D ata Summing up.
E N D
Linkable data, linked data and textsWhat have the Digital Humanities to offer, based on the CIDOC-CRM and TEI. Christian-Emil Ore University of Oslo NTNU 02.11.2018
Agenda • Introduction • Text encoding • Conceptual Modelling • An example – Mediaeval texts and Linkable Data • Summing up
Virtual Research Environment – 1945 MemEx (Memory Extention) with microfilm storage (Based on Vannavar Bush’s paper As We May Think, 1945) https://www.youtube.com/watch?v=c539cK58ees
Linking data – 1997 “Norwegian farm names” 139, Jaaberg. Pron: jåbber. References: - iJabærghi RB. 31, 56. Jabergh DN II 657, 1471. Iaberg NRJ. IV 127. Jabere DN III 836, 1539. [...] DiplomatariumNorvegicum Vol II p. 657 No. 882, Date 26 August 1471. Place: [Hyppestad] [...] JtemsworocStenulffLeidulfsonsinszfadhursordh at hangek med sin fadhuraffJaberghsom ligger iSandaHereddeghieffthersancte Johannes dagh [...] 23447. Grave find from Roman iron age from the stone circle at Jåberg (farmnr. 139) Sandarparish,Vestfoldcounty. A) Bronze fibula from older Roman periode of the main type [...] Archaeological acquisition catalogue
Agenda • Introduction • Text encoding • Conceptual Modelling • An example – Mediaeval texts and Linkable Data • Summing up
Henry III Fine Rolls Project (Ciula, Viera: “Complementing and extending TEI documents with an ontology”. TEI Members Meeting 2008) Text andontology • TEI XML • Physical and logical structure • Semantic content • RDF/OWL ontology • Network of associations • Additional statements and interpretative layers • <persName key="ashford_de_william">William de • <placeName key="ashford1">Ashford</placeName> • </persName> <rs key="abjuration" type="subject">on the day he abjured the kingdom<persName key="rumberue_de_thomas">Thomas de <placeNamekey="rumberue">Rumberue</placeName></persName></rs>
Encoding for extraction A fragment of a imaginary archaeological excavation report: “The excavation in Wasteland in 2005 was performed by Dr. Diggey. He had the misfortune of breaking the beautiful sword (C50435) into 30 pieces.”
Actor: Dr. Diggey Relation: performed Event: E1 Type excavation Place: Wastland Time- span 2005 Actor: Dr. Diggey Relation: performed Event: E2 Type: Modification Descr: Breaking the sword into 30 pieces Relation: part of E1 Relation: in presence of Object: Sword Relation: identified by Identifier: C50435 Information extraction <TEI> <teiHeader> … </teiHeader> <text>… <p xml:id="p1"> <rs xml:id="e1">The excavation in <name type="place" xml:id="n1">Wasteland </name> in <date xml:id="d1">2005</date></rs> was performed by <name type="person" xml:id="n2">Dr. Diggey </name>. He had the misfortune of <rs xml:id="e2"> breaking <rs xml:id="o1">the beautiful sword <rs xml:id=“o_id1”>(C50435)</rs></rs> into 30 pieces</rs>. </p> … </text></TEI>
1 TEI integration routes TEI document Body <name>...</name> <rs>...<rs> <name>...</name> Header <place>...<place> <event>...<event> 3 TEI document 2 Body <name>...</name> <rs>...<rs> <name>...</name> Header <...> </...> TEI document Body <name>...</name> <rs>...<rs> <name>...</name> Header <rdf:>...<rdf:> <rdf:>...<rdf:>
Agenda • Introduction • Text encoding • Conceptual Modelling • An example – Mediaeval texts and Linkable Data • Summing up
The principle of Entropy Fallacy • Massive data aggregation: • Increased amount of data = Increase of amount of information • Increased interlinking = Increase in information • Popular view: Everything is connected to everything
Ontology • An ontology is a conceptual model, that is, a formally defined model resulting from an analysis of a specific domain • not necessarily a data model in the computer science sense. • Core ontologies with universals • General ontologies with particulars (thesauri/authority systems) • a formal ontology can be expressed
Eight basic concepts for data integration • Events • Person • Place • Time/Date • Physical Objects • Conceptual Objects • Names • Types
Event oriented analysisEight basic concepts for data integration Objects participate in Actors involved Abstracts where Events when Time/Date Places characterize identify Names Types
CIDOC-CRM E55 Types refer to / refine E39 Actors (persons, inst.) E28 Conceptual Objects E41 Appellations refer to / identifie E18 Physical Things participate in affect or refer to E2 Temporal Entities (Events) have location at within E52 Time-Spans E53 Places
CIDOC-CRM(http://www.cidoc-crm.org) CIDOC Conceptual Reference Model (CRM) CRM Few concepts, high recall Event happened at Thing Actor was present at FRBRoo LRMoo Special concepts, high precision CRMSci CRMInf CRMGeo CRMArcheo PRESSoo CRMDig Acc. Martin Doerr
Agenda • Introduction • Text encoding • Conceptual Modelling • An example – Mediaeval texts and Linkable Data • Summing up
Collection 1–DiplomatariumNorvegicum Summary Source info Text number Date Place Edited text
Collection 3 – RegestaNorvegica persons, places, subject, etc. are in the registries text witnesses where the charter text is published, e.g. in Diplomatarium Norvegicum
1 What is the original text?A more complexexample AfterApographaArn. Magn., presumably from a lost codex, Bergen (Barth. IV (E) 378 – 374) (Printed in Thork, Dipl. II 25)
3 What is the original text? Lost some time after 1311 27 July 1228, Perugia, original 9 May 1311, Bergen, vidimus(copy). Lost in the great fire Copenhagen, 1728? To Copenhagen, ca 1670 - 1690 PrintedThork. Dipl. II. 25. Copenhagen/Leipzig 1786 Transcribed in Bartholin’s Collectanea, 1690 27 July 1228 part printed DN II 1851 9 May 1311 frame (Vidimus) printed DN IX 1876
Norwegian Charters • Diplomatarium Norvegicum • 23 volumes, cover 1100 to 1582 • Published 1846 – 20011 • Retro-digitized, TEI P5 encoding • Newer transcripts • Old Norwegian 1170 – 1405, 4000 transcripts • TEI P5, no metadata, only identifier • RegestaNorvegica • 9 volumes cover 1100 to 1408 • Very rich in metadata • TEI P5 encoded
Tools & Methods • Encoding the original texts as XML-documents • Text Encoding Initiative, tei-c.org • Medieval Nordic Text Archive, menota.org • Metadata expressed compliant with ontologies • Cultural heritage view: CIDOC-CRM (ISO-21127), • Library/bibliographic view: FRBR/LRM (FRBRoo/LRMoo) • Encoding of metadata • TEI-XML for presentation and archival purposes • RDF for linked data
Linked data – TEI-XML documents <TEI ...> <teiHeader> <fileDesc> <!--All kind of metadata--> <!-- Persons, places, bibl. ref, text witnesses etc --> </fileDesc> </teiHeader> <text> <! xml encode proper text goes here -->... </text> </TEI> Part 1, the proper text Addtional structure with extracted assertions/metadata from the document expressed in RDF -XML Part 2, data for Linked Data (semantic web)
Possible points for external links • RegestaNorvegica/DiplomatariumNorvegicum • Persons, places, subject, onomastic information • Creation date, place • Text witnesses, archival signature, provenance • Cross references for copies (vidimus) etc. • Published, mentioned, bibliographic references • Transcripts • Text witnesses, archival signature • Linguistic information
Agenda • Intro • Text encoding • Conceptual Modelling • An example – Mediaeval texts and Linkable Data • Summing up
The well-known 5 stars ofLinked Data • Data is available on the Web, in whatever format. • Available as machine-readable structured data, (i.e., not a scanned image). • Available in a non-proprietary format, (i.e, CSV, not Microsoft Excel). • Published using open standards from the W3C (RDF and SPARQL). • All of the above and links to other Linked Open Data.
Two additional stars • The schemas (vocabularies/models) used in the dataset are explicitly described and published alongside the dataset, unless the schemas are already available somewhere on the Web. • The quality of the dataset against the RDF-schemas used in it must be explicated, so that the user can evaluate whether the data quality matches her needs. Hyvönen, E., Tuominen, J., Alonen, M. andMäkelä, E. (2014)
SomeconclusionsI • The design of conceptual models is in itself a scholarly activity. It must be based on an stringent analysis of the scientific practice and source material of a given field. • Well defined ontologies may also act as a intellectual guide in the scholarly analysis of a source material • Without the use of common standard models like the CIDOC-CRM, data integration can only be done on a trivial level.
Someconclusions II • Combining existing ontologies uncritically may result in unintended connections • Ad hoc bottom up methods for data integration may be useful but must be complemented by top down methods provided by well-founded conceptual models
Thank you for your attention • Contact details: • Email:- c.e.s.ore@iln.uio.no • References: • CIDOC-CRM: cidoc-crm.org • TEI: tei-c.org • Current (old) versions of Dipl. Norv., Norw. Farm Names, and Reg. Norv: www.dokpro.uio.no