290 likes | 482 Views
The Natural History Museum http://www.nhm.ac.uk. Speaker: Charles Hussey Science Data Co-ordinator Department of Information and Library Systems c.hussey@nhm.ac.uk. The Trustees of The Natural History Museum, 2002 . Data Access - challenges and opportunities.
E N D
The Natural History Museumhttp://www.nhm.ac.uk Speaker: Charles Hussey Science Data Co-ordinator Department of Information and Library Systems c.hussey@nhm.ac.uk The Trustees of The Natural History Museum, 2002
Data Access - challenges and opportunities Move towards networks connecting distributed sources Two components to this presentation Start by drawing upon work for European Natural History Specimen Information Network (Personal view of what is achievable) Then look at some of the approaches we have taken within The NHM
Acknowledgements Nicolas Bailly, MNHN Paris, ENHSIN David Gee, originator of DSML Dilshat Hewzullah, NHM, DSML & Querying distributed databases Anne Hume, NHM, Online databases and DSML Andrew Jones, University of Cardiff, SPICE for Species 2000 Mike Lowndes, NHM, Museum Information Locator System Rachel Perkins, NHM, Collections Level Descriptions Mike Sadka, NHM, Fast-track programme Darrell Siebert, NHM, Fish Collection Database Chris Sleep, NHM, DSML Neil Thomson, NHM, BioCASE
Nature of Data What do we have to deal with? First Challenge: Integrating disparate sources NHM Survey in 2000: 87 institutions responded: 33 different products; 40% using bespoke solutions; 5 using spreadsheets BioCISE Survey in 1998/99: 292 institutions responded: 60 different products; 75% using bespoke solutions; Only 8% providing web access to unit level data
Nature of Data First Challenge: Integrating disparate sources Do data providers have the means to: • Implement and maintain a local Internet Server providing 24-hour a day access? • Compile metadata (collections level or unit level)? • Supply additional data (such as resolving localities or providing elements of higher taxonomy) • Maintain quality of datasets • Construct views of their data or implement wrappers • Handle version control
Nature of Data Second Challenge: Comparing like with like • Authorities for names • Personal names • Geographic co-ordinates • Place names • Language and spelling
Architectures • Single client/server database used by all providers and users 2. Central summary system 3. Central Gateway to distributed databases 4. Peer-to-peer databases 5. Web directory pointing to data sources
Architectures • Single client/server database used by all providers and users Single database, subscribers have local client Allows detailed and complex interaction with data Example: NHM Palaeontology Collections Management System Example: Packages for Observers – Recorder 2000, MapMate
Architectures 2. Central summary system Contributors maintain their own systems and post copies of data to centrally maintained database Example: NBN Species Dictionary
Architectures 3. Central Gateway to distributed databases No central database …but “Common Access System” may store metadata Example: Species 2000 Example: Biodiversity on the Web
Biodiversity on the Web Selection of Searchable Databases
Architectures 4. Peer-to-peer databases multiple Z39.50 servers and clients Example: Species Analyst Example: AHDS
Architectures 5. Web directory pointing to data sources Essentially, a portal Example: BIODIV
Other Issues • Scalability • Sustainability • Access • Quality Control Terminology Control “Gaps” in data: Still parts of collection not yet databased Collection not suitable for databasing at unit level Inadequate data dictionary Data not available for a specimen Data needs interpretation Indicators for Quality
A Case in Point: Wrapping a dataset for ENHSIN Pilot • Copy table from Access to SQL Server • Restructure table to add “new” fields • Perform conversions: • Place = Waterbody + Locality(verbatim) + Site.Ref. • Split Collection date to DAY, MONTH, YEAR • Convert Lat & Long to decimal degrees • Convert Altitude to metres and deal with altitude ranges • Shape = Material + “(“+Preservation Method +”)” • Collector = Collector Surname + Initials + Title • Determiner = Determiner Surname + initials + Title • Populate blank fields with static data by creating view (e.g. for Kingdom, Collection Name, Contact Info.) • Delete fields not required after conversion • Rename fields to match ENHSIN element names
NHM Initiatives • Imaging of Primary Sources • Zoology Accession Ledgers • Entomology Card indexes (VIADOCS project) • Rapid Data Entry • Fish Collection • Botany Pilot • Collections Level Description • Darwin Centre • Entomology Index to Collections • Integrated Access • Data Locator
Links ENHSIN: http://www.nhm.ac.uk/science/rco/enhsin/index.html SPICE Project: http://www.systematics.reading.ac.uk/spice Biodiversity on the Web: http://www.biodiversity.org.uk/ibs/ Species Analyst: http://habanero.nhm.ukans.edu NBN Species Dictionary: http://yaw.nhm.ac.uk/nhm/ AHDS Gateway: http://prospero.ahds.ac.uk:8080/ahds_live/ BIODIV: http://www.br.fgov.be/biodiv/ NHM Collection Level Descriptions: http://www.nhm.ac.uk/cld/index.shtml NHM Data Locator:http://internt.nhm.ac.uk/cgi-bin/locator/ Online databases at NHM: http://www.nhm.ac.uk/science/projects.html