250 likes | 317 Views
The Caught and Coloured website :. its EMu origins Alex Chubaty – Collection Information Systems Craig Churchill – IT Software Development Museum Victoria. www.museum.vic.gov.au/caughtandcoloured. Background.
E N D
The Caught and Coloured website: its EMu origins Alex Chubaty – Collection Information Systems Craig Churchill – IT Software Development Museum Victoria
Background • Documents the illustration of fauna in colonial Victoria in The Prodromus of the Zoology of Victoria • Collection of artwork and manuscripts held in MV Archives • Website managed by MV Online Publishing team
Basic Data structure • Data used in website collected initially for purposes of collection management • Two kinds items catalogued • Parent/Child structure of records
Data collection • Complex data set capturing information relating to science and art • Used Catalogue, Parties, Bibliography, Taxonomy, Collection Events & Sites, Multimedia (MMR) and later, Narratives modules • Partitioning/tab switching • Early data recorded first in spreadsheet then transferred to EMu
EMu records and relationship with website • Data and images collected in EMu used in ‘Collection’ section of website • Searchable under headings or groupings of types of fauna • Once a faunal group is selected individual species as represented in drawings, prints and notes can be browsed
Additional data linked to Catalogue • Some data added to MMR records used in website: • Title field = Caption • Metadata tab = alt tag
Additional data linked to Catalogue • Other types of data added to Narratives module and linked to Catalogue records: • Narrative about the faunal group • McCoy’s description of species in the Prodromus • Kate Phillips’ description of species from Melbourne’s Wildlife • Numbers 2 & 3 flagged in Narratives Identifier field • Number 1 has relevant Catalogue records attached
Other sections of website using Narratives • McCoy’s Zoology of Victoria • Natural Observations • Stories from Nature • Each section a Master Narrative with several sub Narratives • Each sub Narrative may have its own sub Narrative • Associated images also entered into MMR and linked to the Narratives records
Getting Data out of EMu • EMu reports created using select data • Separate reports for Catalogue, Narratives and MMR records • Reports exported in Excel format
Into SQL Server • Perl script reads Excel reports and loads data including images into SQL Server • Creates a table for each module and necessary relationship tables • Ecatalogue • EcatalogueMultimedia • Captures values in labelled text fields and loads into separate fields • Attempts to identify Scientific names and surround with <sn> tags • Not a fully automated process, takes approximately 30 minutes to update data
Out to the Web • ASP.NET environment using VB.NET • Images served directly from database and resized dynamically (thumbnails) • XML tags in data converted to html or using as processing instructions • eg <hst> converted to <div class=“historic-text”></div>
Marking Up the Content • Storing HTML in EMu • Is this a good thing to do? • What are the alternatives? • a less intrusive mark-up like WikiWikiWeb c2.com/cgi/wiki • store HTML in EMu put don’t display • use XML instead • Storing XML in EMu • What Schema should we use • Should we create our own? • Investigate existing Schemas • Text Encoding Initiative http://www.tei-c.org/ • Use XSLT PageView to preview
Scientific Names Requirement “All scientific names should be italicised when displayed on the web.” Problem “How do we identify scientific names contained within a text field if they haven’t been tagged?”
Scientific Names (cont) Possible solutions • Cross reference/link text against taxonomy module • Check text against a pre-build list • TaxonGrab – Natural Language Processing solution written in PHP: • http://sourceforge.net/projects/taxongrab • FindIT - parses freetext and identifies scientific names and author combinations: • http://names.mbl.edu/tools/recognize.php • Currently testing this technology, initial results are promising
Future Possibilities • Hit EMu directly, no more exporting data to SQL Server • Use KE PHP web and web services libraries • record extractor object • xml, xslt and xpath • Investigate professional XML authoring tools to allow authors to create narratives that are valid and well formed