250 likes | 323 Views
Explore The Prodromus of the Zoology of Victoria's illustrations and manuscripts on Museum Victoria's website. Learn about the EMu data structure used for collection management and the complex data sets capturing science and art information. Discover how EMu records are linked with website sections, narratives, and multimedia for a comprehensive experience. Find out the process of data extraction from EMu reports to SQL Server, XML tagging, and storing HTML or XML in EMu for web display. Uncover the challenges and solutions for scientific names display requirement, including tools like TaxonGrab and FindIT. Dive into future possibilities of direct EMu integration and utilizing professional XML authoring tools for narrative creation.
E N D
The Caught and Coloured website: its EMu origins Alex Chubaty – Collection Information Systems Craig Churchill – IT Software Development Museum Victoria
Background • Documents the illustration of fauna in colonial Victoria in The Prodromus of the Zoology of Victoria • Collection of artwork and manuscripts held in MV Archives • Website managed by MV Online Publishing team
Basic Data structure • Data used in website collected initially for purposes of collection management • Two kinds items catalogued • Parent/Child structure of records
Data collection • Complex data set capturing information relating to science and art • Used Catalogue, Parties, Bibliography, Taxonomy, Collection Events & Sites, Multimedia (MMR) and later, Narratives modules • Partitioning/tab switching • Early data recorded first in spreadsheet then transferred to EMu
EMu records and relationship with website • Data and images collected in EMu used in ‘Collection’ section of website • Searchable under headings or groupings of types of fauna • Once a faunal group is selected individual species as represented in drawings, prints and notes can be browsed
Additional data linked to Catalogue • Some data added to MMR records used in website: • Title field = Caption • Metadata tab = alt tag
Additional data linked to Catalogue • Other types of data added to Narratives module and linked to Catalogue records: • Narrative about the faunal group • McCoy’s description of species in the Prodromus • Kate Phillips’ description of species from Melbourne’s Wildlife • Numbers 2 & 3 flagged in Narratives Identifier field • Number 1 has relevant Catalogue records attached
Other sections of website using Narratives • McCoy’s Zoology of Victoria • Natural Observations • Stories from Nature • Each section a Master Narrative with several sub Narratives • Each sub Narrative may have its own sub Narrative • Associated images also entered into MMR and linked to the Narratives records
Getting Data out of EMu • EMu reports created using select data • Separate reports for Catalogue, Narratives and MMR records • Reports exported in Excel format
Into SQL Server • Perl script reads Excel reports and loads data including images into SQL Server • Creates a table for each module and necessary relationship tables • Ecatalogue • EcatalogueMultimedia • Captures values in labelled text fields and loads into separate fields • Attempts to identify Scientific names and surround with <sn> tags • Not a fully automated process, takes approximately 30 minutes to update data
Out to the Web • ASP.NET environment using VB.NET • Images served directly from database and resized dynamically (thumbnails) • XML tags in data converted to html or using as processing instructions • eg <hst> converted to <div class=“historic-text”></div>
Marking Up the Content • Storing HTML in EMu • Is this a good thing to do? • What are the alternatives? • a less intrusive mark-up like WikiWikiWeb c2.com/cgi/wiki • store HTML in EMu put don’t display • use XML instead • Storing XML in EMu • What Schema should we use • Should we create our own? • Investigate existing Schemas • Text Encoding Initiative http://www.tei-c.org/ • Use XSLT PageView to preview
Scientific Names Requirement “All scientific names should be italicised when displayed on the web.” Problem “How do we identify scientific names contained within a text field if they haven’t been tagged?”
Scientific Names (cont) Possible solutions • Cross reference/link text against taxonomy module • Check text against a pre-build list • TaxonGrab – Natural Language Processing solution written in PHP: • http://sourceforge.net/projects/taxongrab • FindIT - parses freetext and identifies scientific names and author combinations: • http://names.mbl.edu/tools/recognize.php • Currently testing this technology, initial results are promising
Future Possibilities • Hit EMu directly, no more exporting data to SQL Server • Use KE PHP web and web services libraries • record extractor object • xml, xslt and xpath • Investigate professional XML authoring tools to allow authors to create narratives that are valid and well formed