1 / 43

CSIRO Marine Research Divisional Data Centre Current and Future Activities

CSIRO Marine Research Divisional Data Centre Current and Future Activities Tony Rees, Data Centre Manager April 2004. Talk outline. General Divisional context – past and present Data Centre approaches and tools – including MarLIN, Data Warehouse & Trawler, CAAB, C-squares, and OBIS

tomai
Download Presentation

CSIRO Marine Research Divisional Data Centre Current and Future Activities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSIRO Marine Research Divisional Data Centre Current and Future Activities Tony Rees, Data Centre Manager April 2004

  2. Talk outline • General Divisional context – past and present • Data Centre approaches and tools – including MarLIN, Data Warehouse & Trawler, CAAB, C-squares, and OBIS • Data Centreservices to CMR projects • Cleveland-specific issues Target audience and level of talk • Introductory / overview level, some examples but not full detail • Aimed at CMR staff in general, project managers, plus project metadata staff • Database designers, application developers will find material of interest, but need separate more detailed info.

  3. What are the Division’s chief assets? • Our people (and their intellectual capabilities) • Our hardware (collecting platforms etc.) and technologies • Our data – newly collected, plus historic data How do we manage our data assets? • Mixture of good, moderately good, and not good at all • “good” – well documented; details in searchable catalogue; appropriate/current formats; online access (to appropriate users); ongoing curation • “moderately good” and “not good” depart from the above, to lesser or greater degree • Data Centre curates selected datasets on behalf of the Division, others reside long-term in projects • Data Centre also maintains “MarLIN” – the Division’s data catalogue (metadata system)

  4. Australian Spatial Data Directory – national cross-agency metadata gateway ASDD 3rd party data (CMR copy) example Overview of metadata, data systems – national context metadata systems NOO GA AAD AIMS etc. Neptune MarLIN CMR describe / point to ... AAD data GA data NOO data AIMS data CMR data etc. search via ASDD – search across multiple agencies, basic functionality search via MarLIN – search only CMR holdings, but extra functionality (also view “CMR internal” records not visible to external users)

  5. The Card Index ...

  6. MarLIN Marine Laboratories Information Network Divisional Data Catalogue (metadata system)

  7. What is in MarLIN? • Descriptions of <2,000 Divisional datasets (including c.1000 held by the Data Centre) • Individual MarLIN records are searchable by subject, keyword, CMR project, geographic region, time period, biological species, voyage reference, and more • Contain metadata (“data about data”) in a common structure (ANZLIC format plus CMR-specific additional fields) • Can contain links to images, related documents, data files, and other metadata records • “Quick maps” (using c-squares data footprints, see later) can indicate the spatial extent of the data • Who creates MarLIN metadata records? • Records are created/maintained by the data custodians, who best understand the data and associated useful resources, using an online metadata entry form (Data Centre staff can assist with this process)

  8. revised Data Centre website (extract)

  9. Sample MarLIN content Alphabetical dataset lists

  10. Sample MarLIN content Alphabetical dataset lists Indexes by keyword, etc.

  11. Sample MarLIN content Alphabetical dataset lists Indexes by keyword, etc. Brief dataset details

  12. example search result ... (etc.)

  13. Viewing the full metadata record produces ... with clickable link to show dataset extent using c-squares: (etc.)

  14. (Quick look at the ASDD)

  15. What’s in it for me / us? • Allows CMR staff / others to know what data we have already, what we are collecting (or plan to collect), what we do not have (gap analysis) – facilitates data re-use, avoids duplicate acquisition, fosters collaborations • Permits inspection of relevant data documentation in order to assess data usefulness / completeness / quality, inspect thumbnails of data coverage, etc. • Gives a contact person and/or electronic access for the data, via a standard entry point • Provides dissemination of project scientific activities into a new “information space” – online searching via the ASDD, indexing by web search engines, possible future one-csiro system (only don’t hold your breath for the latter) • Can be feasible for projects to utilise MarLIN to catalogue / access their own data – use MarLIN’s built-in search capability rather than re-invent.

  16. Data Warehouse and Data Trawler

  17. 2000 onwards – databasing of “all” Data Centre holdings into a Divisional Data Warehouse, accessed by a custom “Data Trawler” application • Historic holdings of Hydrology (bottle chemistry) and CTD data – 200,000 HYD analyses, 10,000 CTD casts, from hundreds of research voyages and coastal stations • Underway data for 175 research voyages (10 million observations) – depth, position, time, meteorological variables, sea temperature, salinity, fluorescence • Biological (catch composition) data from 85 voyages – 10,000 trawls, 240,000 individual species records (number or weight caught) • Currents data from 548 moored current meters (3 million readings) • ADCP data, some old hydrology data still in archives, awaiting migration to on-line Warehouse system. Also note, c. 50% of Divisional catch data is not held by the Data Centre at this time (probably still with original investigators)

  18. example Data Trawler Screens

  19. current Warehouse content accessible via Data Trawler HYD and CTD data – all years

  20. moorings data – all years current Warehouse content accessible via Data Trawler

  21. catch data – all years current Warehouse content accessible via Data Trawler

  22. What’s in it for me / us? • Provides access to centrally held data on a self-serve basis, via a standard web browser • Allows queries to be constructed by data type, region, time period, species, voyage ... • Contains the actual data, but not text information (the latter is in MarLIN) • Permits retrieval of data across multiple projects, as integrated result set in a common format • Provides preview / mapping of spatial extents of result sets generated (closer to true web GIS facility cf. MarLIN, which is more of a quick “thumbnail” facility) • Data are provided in csv / spreadsheet compatible format, suitable for upload to user’s own machine for further manipulation.

  23. “Data Trawler” application Divisional Data Warehouse Austr. Spatial Data Directory (ASDD) Off line archived data Hyperlinked documents, graphics, etc. Project-based data holdings Systems considered thus far ... “MarLIN” Data Catalogue Remote Applications Divisional Systems

  24. CAAB Codes for Australian Aquatic Biota master taxonomic database

  25. 1999-current – upgrading of “CAAB” master taxon management system for the Division • CAAB (Codes for Australian Aquatic Biota) is a database of species names and codes, now covering >25,000 marine species in Australian waters • codes are standardised species identifiers for use in Divisional databases (species names may change, codes are intended to be constant) • “quick maps” of all catch data in the Warehouse (by species) have been associated with relevant CAAB record; also predicted species ranges for c. 3,000 fish species • individual maps form clickable interface(s) to retrieve corresponding data items (individual catch records) from the warehouse and display in a web page

  26. web-accessible version of CAAB

  27. web-accessible version of CAAB

  28. web-accessible version of CAAB

  29. What’s in it for me / us? • Codes are a standard storage and interchange format for taxonomic information in CMR and other regional databases • CAAB website and derived tables allow matching of codes to names, and vice versa • Check correct spelling of species names, full citation, generate Australian species lists per genus / family / larger category • Links to pictures and maps of CMR data distribution, where available • “Quick maps” form clickable front end to Data Warehouse queries • Also provides access to most recent predicted species range in many cases • Potentially supports “what lives here” queries from predicted species ranges and specified depths (fishes only, at present time).

  30. C-squares Concise Spatial Query and Representation System spatial indexing and mapping utility

  31. “C-squares” mapping / spatial indexing utility • Original Data Centre creation, 2001 onwards • Mainly a developer’s tool • Permits “lightweight” spatial indexing, queries, and web mapping from a standard text-based system (no GIS required) • Currently used in 4 CMR and 3 international systems • (Tony Rees can supply more details if interested).

  32. OBIS Ocean Biogeographic Information System www.iobis.org

  33. OBIS – Ocean Biogeographic Information System • Operated by an international consortium, including CMR representation • Like a “super CAAB” for the world, but with names only (not codes) • Can currently access point data for 20,000 marine species from c. 20 institutions worldwide (2 million records), plus lists of names awaiting data, and returns integrated result sets (like Data Trawler) • Many aspects similar to CAAB, including “Quick maps”, click-on-map spatial queries, OBIS taxonomic groups, and more (Data Centre staff did the interface and query logic) • CMR catch data to be visible via the system in due course.

  34. Data Centre Services to CMR projects

  35. Who are we? • Tony Rees (Hobart) – Data Centre manager; MarLIN, CAAB, C-squares technical support & development; national & international connections; project-level advice (metadata) • Pamela Brodie, Leanne Wilkes (Hobart) – Data Warehouse, Data Trawler support and data loading; project-level advice (databases) • Miroslaw Ryba (Hobart) – Oracle support; ships biological data collection suite • Terry Byrne (Hobart) – National Facility Data Librarian; data requests; data archiving • Hiski Kippo (Floreat) – project-level liaison, DC representation (WA) • Steven Edgar (Cleveland) – project-level liaison, DC representation (QLD)

  36. “On the ground” DC services to CMR projects • Advice and assistance to CMR project staff – metadata entry, database design, general data management issues • Maintaining the Division’s Oracle systems, and provision of Oracle advice and web-based help • Servicing/forwarding data requests as appropriate • Migrating project data to the Data Warehouse, for integration with other relevant data holdings, and archiving data to offline media as required • Looking at whole-of-Division issues such as data access and exchange policies, engagement with relevant national and international data operations, cross-CSIRO data access, etc. • New Data Management officers in Floreat (2002) and Cleveland (2004) • Developing interest in GIS data layers and systems e.g. ArcSDE, ArcIMS • Continuing to advance existing DC systems on three fronts – tools, content, and connectivity (internally, nationally, internationally).

  37. Cleveland-specific issues ... • Steven Edgar has an advisory role for Data Management in projects at the Cleveland site (project personnel actually do the project-level management); can assist with database design, etc., also some/all Oracle administration needs • Steve’s time (or portions of it) can be spent on migrating project data to our central warehouse/trawler system, also assisting project staff with metadata entry as needed • Steve brings new expertise in GIS systems to the Data Centre; will take an interest in cross-project / cross-Divisional GIS issues and progress where possible • Steve can act as conduit for technology/content/expertise transfer in 2 directions (DC Systems/tools > CMR projects and vice versa) – also the “eyes and ears” of the Data Centre in Cleveland to bring local issues to Hobart attention as needed • Additional Hobart-based staff are only an email or phone call away if they can be of assistance.

  38. published output administrative details Project-based data holdings Project completed project overview interim documents, graphics, etc. project data repository Persistent project db’s Off line archived data Summary – an idealised “data life cycle” at CMR PSS “Data Trawler” application Project starts project data Divisional Data Warehouse “MarLIN” Data Catalogue

  39. towards “best practice” data management at project level ... • Projects should be recording the existence of their data in MarLIN – ideally sooner rather than at end of project • Data should eventually be migrated off PCs into Divisional systems • As much relevant data as possible should be in the Warehouse • Effort should be made to produce definitive / final version of the data • Data Centre can help with archiving for closed projects • Data Warehouse table structure, and other Divisional databases, can provide starting points / examples for project level databases • Taxonomic / survey data recording should employ CAAB codes as a Divisional standard • ... refer Data Centre internal website and local Data Centre person/s for additional information.

  40. Some action items / ideas for discussion ... • Upgrade MarLIN content to reflect the true data holdings of the Division (augmented with project descriptions as available) • Look into migrating more “completed” project datasets into centralised (Data Centre) holdings / systems • Locate as much as possible of the “missing” catch data, to add to present Warehouse content • Obtain clearance as needed to make CMR catch data visible to the outside world (currently, it is all intranet-only) via Data Trawler and other linked systems (CAAB, OBIS, others) • Assist project staff with pressing data management issues and work to ensure good technology transfer for database design, etc. • Work with key project staff to progress the usefulness of the “new” web-enabled GIS systems across appropriate datasets, for the benefit of multiple users • Identify needs to digitise important non-digital data holdings (notebooks, field log sheets etc.) and assist in seeking resources to digitise them.

  41. Feedback / discussion time ... JJJJJ

  42. Summary of core Data Centre components as at April 2004 www.marine.csiro.au/csquares/ external c-squares users – FishBase, OBIS, others “C-squares” Spatial Indexing/ Mapping System www.marine.csiro.au/caab/ “CAAB” Taxonomic Database (e.g.) www.iobis.org/ www.marine.csiro.au/warehouse/jsp/loginpage.jsp “Data Trawler” application Distributed AODC? “MarLIN” Data Catalogue Divisional Data Warehouse OBIS? other? www.marine.csiro.au/marlin/ Austr. Spatial Data Directory (ASDD) Off line archived data asdd.ga.gov.au/asdd/ Hyperlinked documents, graphics, etc. Project-based data holdings Remote Applications Divisional Systems

More Related