380 likes | 471 Views
Accessing treasure on lands and peoples. Peter Burnhill Director, EDINA, University of Edinburgh. Inspired by a Keynote remark by Professor Gillies …. Credits: who planned the dive & dived the wreck.
E N D
Accessing treasure on lands and peoples Peter Burnhill Director, EDINA, University of Edinburgh
Credits: who planned the dive & dived the wreck • The team within EDINA:Des Reid, Senior Software EngineerDimitrios Sferopolous, Software EngineerNeil Mayo, Software Engineer • Jackie Clark, Web Designer led by Christine Rees, Head of Bibliographic & Multimedia • And those to whom we all owe lots: (in Centre for Research Collections, IS Library & Collections)Kirsty Stewart (project manager and archivist)Lesley Bryson nee Doig (initial project manager) Grant Buttars, Deputy University ArchivistAndrew Wiseman, Researcher, TEI expertDonald William Stewart, Senior Project Researcher led by Arnott Wilson (University Archivist) & John Scally (University Collections)
Languages & Perspectives Digital Library has mixed parentage- a ‘re-mix’ of the document tradition & the computation tradition • “approaches based on a concern with documents, with signifying records: archives, bibliography, documentation, librarianship, records management, and the like … [Domain knowledge speak] • “approaches based on uses of formal techniques, whether mechanical (such as punch cards and data-processing equipment) or mathematical/computational (as in algorithmic procedures).” [Software engineer speak] Prof. Michael Buckland, Presidential Address, American Society for Information Science, JASIS’s 50th (1998) http://people.ischool.berkeley.edu/~buckland/asis62.html
Heard report on work from the Dive Team • On from the marvels of reading and interpreting of marks on paper of the notebook entries • … and the meticulous transcription into machine-readable text • … and their tagging using Encoded Archival Description (EAD) with text in XML format* • * * mark-up that software can process more easily
Example of the XML EAD data (1) <!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded Archival Description (EAD) Version 2002)//EN" "ead.dtd”> <c level="item" id="GB-237-Coll-97-CW114-42”> <did> <unitid encodinganalog="isadg(2)311" label="Reference code">GB 237 Coll-97/CW114/42</unitid> <physloc label="Shelfmark" encodinganalog="shelfmark">CW102-121<physloc> <unittitle encodinganalog="isadg(2)312">Song about Uamh-an-Oir, accompanying story and notes</unittitle> <unitdate encodinganalog="isadg(2)313" certainty="certain" normal="" type="inclusive">1867</unitdate> <repository label="Repository" encodinganalog="NAHSTE31">Edinburgh University Library, Special Collections</repository> <physdesc label="Extent and Medium of the Unit of Description" encodinganalog="isadg(2)315" audience="external”> <extent>folio 67v, line 17 to folio 68r, line 4</extent> <dimensions/> </physdesc> <!--Replace language code if other than English with ISO 639-2 three letter language code. Add further language tags if necessary <langmaterial> <language langcode="gla">Gaelic</language>
Work at the Refactory The XML files were passed to engineers at EDINA … import script in Perl that parses the XML and constructs the relational structure, with reference to an existing database schema as shown. • The green boxes indicate high-level entities: catalogue entry, its transcript and images. • The pink ‘cat_* boxes’ are links from catalogue entries to such things as places, people and subjects.
Example of the XML EAD data (2) <!--Insert controlaccess index terms here if needed <!--Delete any tags not required--> <controlaccess encodinganalog="NAHSTE38”> <head>Index</head> <controlaccess encodinganalog="NAHSTE381”> <head>Subjects</head> <subject authfilenumber="218">Caves</subject> <subject authfilenumber="327">Dogs</subject> <subject authfilenumber="2665">Hair</subject> <subject authfilenumber="3910">Loss (of people or things)</subject> <subject authfilenumber="3814">Men</subject> <subject authfilenumber="3936">Rescues</subject> <subject>Waulking songs</subject> </controlaccess> <controlaccess encodinganalog="NAHSTE382”> <head>People</head> <persname authfilenumber="4708">| Mor Iain ic Dhòmhnaill Bhàin | fl1867 | Isle of Barra | Inverness-shire</persname> <persname authfilenumber="4278">MacNeil | Roderick | c1790-1875 | Ruaraidh an Rùma | crofter | Mingulay</persname> </controlaccess>
Example of the XML EAD data (3) <language langcode="eng">English</language> </langmaterial> <origination label="Name of Creator(s)" encodinganalog="isadg(2)321">Alexander Carmichael</origination> </did> <scopecontent encodinganalog="isadg(2)331”> <head>Scope and Content</head> <p>Song about Uamh-an-Oir probably collected from Roderick MacNeil, aged 88, crofter, Miùghlaigh/Mingulay beginning 'Na minn bheaga na minn bheaga/theaga, Dol eir creagan dol sna creag' composed of thirteen lines. Uamh-an-Oir is described as starting at Cliata cliff and going under Barra to Gearragaal east of Orasay [Uamh an Òir, Cliaid, Orasaigh, Barraigh/Isle of Barra]. The story tells how five men went into the cave with dogs but only the dogs returned and they were hairless. 'The smith of Loch an Duin [Loch an Dùin] put out the torches. Great men sent them in against their will.' Carmichael writes a note to himself to see Mor Iain ic Dhonuil Bhain [Mòr Iain ic Dhòmhnaill Bhàin] for the 'oran sith sung here at the luadh...She Knows all about the songs made'. A vocabulary note reads ' "Fiallan fiadhaich" An insect on the brain &c!' Written transversely over the text in ink is 'Transcribed Book No III page 62 A[lexander] C[armichael]’. </p>
Work at the Refactory This structure is imported into Solr – software used … to control searching copies of the text (which have been normalised for more effective searching) … and for retrieval of text and images to be rendered on the website
Tobar an Dualchais 6,000 new items now available to search & play Over 24,000 tracks of stories, songs, music, poetry and factual information recorded in Scotland and further afield, from 1930s onwards. • Thousands of oral recordings recorded in Scotland and further afield, from the 1930s onwards. • including stories, songs, music, poetry and factual information. • HLF funding • Joint project: Sabhal Mòr Ostaig, University of Edinburgh, BBC Scotland, National Trust for Scotland
Early work between EDINA & Special Collections • SCIMSS Special Collections Index of Manuscripts, 1995/96 • Once an ‘advanced’ Web service, now retired: Wayback Machine ..
web index was created from the Special Collections’ departmental sets of 180 binders comprising, in alphabetical order, about 54,000 loose-leaf slips containing varied typescript dating from the 1930s.
Early work between EDINA & Special Collections • SCIMSS Special Collections Index of Manuscripts, 1995/96 • Once an ‘advanced’ service, now retired • Statistical Accounts of Scotland, the Sinclair 1790 & 1840 statistical reports on parishes of Scotland, 1999/2001 • a service with Editorial Committee chaired by Dr Ann Matheson
Early work between EDINA & Special Collections • SCIMSS Special Collections Index of Manuscripts, 1995/96 • Once an ‘advanced’ service, now retired • Statistical Accounts of Scotland, the Sinclair 1790 & 1840 statistical reports on parishes of Scotland, 1999/2001 • a service with Editorial Committee chaired by Dr Ann Matheson • NAHSTE / GASHE, 2000/03 • Navigational Aids for the History of Science, Technology and the Environment: collections of archives & manuscripts held in Edinburgh, Glasgow & Heriot-Watt Universities • Gateway to Archives of Scottish Higher Education: descriptions of archives dating from 1215 to present day. • Gavin Inglis at EDINA created free text and index controlled searching facilities based on the underlying structure of the XML formatted data, with simple navigation between data components and an easy means of updating existing data.
Launched in 2003, no subscription fee • Now used by 379 licensed institutions • Several hundred hours of film, across a side range of subject areas and topics • Collections include: • Imperial War Museum, Films of Scotland, Royal Mail Film Classics, Digital Himalaya, Culverhouse Classical Music, Logic Lane, Wellcome Film, Biochemical Society, Healthcare Productions, St George’s Medical School Collection, Education & Television Films Ltd, Amber Films, Performance Shakespeare • Followed BUFVC/OU project for metadata, digitisation & rights clearance • http://www.filmandsound.ac.uk/
3,000 hours of video footage • Collections include: Gaumont Newsreels, News at Ten, ITN News Reports, Channel 4 News, Reuters archives, Roving Report • 60,000 news stories • + 25,000 ITN programme scripts • + unreleased footage • Launched in 2008, no subscription fee, uptake now • grown to 344 universities & colleges • http://www.nfo.ac.uk/ • worked with BUFVC who led project for metadata, digitisation and rights clearance
Initially launched in 2004 • Getty Images to Sept 2010 • Digital Images for Education from Oct 2010 • Schools service started in 2008 and ran until Sept 2010 • Engaging 88 subscribing universities & colleges • http://edina.ac.uk/eig/
1 million image, video and sound resources to discover & use 8 Collections so far 45 Collections so far British Library Archival Sound Recordings
… a rich ecosystem … from food delivered to a one-time wreck
… a rich ecosystem Thank youhttp://edina.ac.uk … from food delivered to a one-time wreck