200 likes | 293 Views
Making and Identifying Digital Objects: the CUGIR 2.0 Approach. Metadata Working Group 10/25/02. Jon Corson-Rikert Elaine Westbrooks Adam Chandler. Overview. CUGIR 1.0 Moving Towards Solutions CUL Internal Grant Web Map Cap Grant Web Mapping Geodatabase-ORACLE SDE CUGIR 2.0 Risks
E N D
Making and Identifying Digital Objects: the CUGIR 2.0 Approach Metadata Working Group 10/25/02 Jon Corson-Rikert Elaine Westbrooks Adam Chandler
Overview • CUGIR 1.0 • Moving Towards Solutions • CUL Internal Grant • Web Map Cap Grant • Web Mapping • Geodatabase-ORACLE SDE • CUGIR 2.0 • Risks • Goals & Summary
CUGIR 1.0 Assumptions • 1995 assumptions are inadequate • Geographic data in NY available by county or USGS quadrangle map • Hosts limited set themes • hosts single versions of themes • Digital resources indexed and retrieved based on file naming conventions
CUGIR 1.0 Implementation • File naming convention combines location, thematic content, and data format • 001hya.gz • Same naming conventions for metadata • Data, metadata & documentation zipped together • Directory structure • One directory per county (62) • Not practical for all data
Stresses to the CUGIR 1.0 Structure • Asymmetrical growth • Tompkins County is unique • File naming conventions stretched, adapted • Proliferation of related data themes • 7 themes could be hydrography • Inability to handle versioning • NAD27 vs NAD83 wetlands data
Final Motivations for CUGIR 2.0 (1) • Availability new data series- watersheds • Requests to handle more localized data • Errors in zipping data, metadata • Difficulty maintaining large numbers of metadata in multiple formats
Final Motivations for CUGIR 2.0 (2) • Difficulty collecting web usage statistics • Need to improve info retrieval • Need to improve CUGIR website/interface • Need to temporarily restrict access to data under requirements of the Patriot Act
Solutions- Library Internal Grant (2001-02) • Enhancing access to CUGIR: converting FGDC metadata to MARC, Dublin Core • Goals: • Make metadata discoverable via OPAC, WorldCat, & OAI • Provide persistent URLs for CUGIR metadata, data • Modeled on Nelson/Maly paper, “Smart Objects for Digital Libraries” SODA
Solutions- FGDC Web Map CAP Grant 2001-02 • Supplementing CUGIR metadata with online linkage pointers to web mapping service(s) • Users can display a map in a standard web browser without requiring a data download or GIS software • Interoperation with other web mapping services (WMS) • Vendor-neutral WMS format for HTTP requests • Display data in CUGIR from other WM services • Make CUGIR data available for other WM services
Changes Under the Hood • Relational database to support the OAI buckets • Currently MySQL; ORACLE next • Primary table uses the 3 column unique key for buckets • Mapsheet (location) – Monroe County, Town of Danby • Coverage (data theme) – Roads, Landfills • Data format – Shapefile, Arc Export, DRG, DEM, CAD • Updated versions treated as new data themes • Census 1995, 1998, and now 2000 data • Dynamically generate web pages • Java, JDBC, Tomcat Servlets and Java Server Pages
Bucket is the Digital Object • Web interface in CUGIR guides users to buckets table via searching/browsing • Functionality to display metadata, preview map, download data • Different “verbs” in the SODA model • NSDI Clearinghouse, OAI, and OPAC searches lead to same database table
Datafiles and Metadata in CUGIR 1.5 • Buckets table stores server, directory, data file name, and metadata file names • Data and metadata can be moved individually or en masse without requiring change to any of the front-end access modalities • Still requires that files exist on disk • We need to migrate away from file-based data storage for CUGIR 2.0
GIS Technology for Data Storage • CUGIR is small potatoes • Largest current vector data ~30,000 records statewide (2000 Census Blocks) • Not considered “large” until > 10,000,000 records • Image data is larger, but services are available on national scale at the level of detail we plan for NY • Technology driven by military requirements
Storing CUGIR data in GeoDatabase (1) • For each data theme, merge individual county or quad data files into large statewide datasets • Data stored in Oracle tables managed by ESRI’s ArcSDE software • ArcSDE manages spatial indices and rewrites SQL to allow Oracle to do primary filtering based on spatial area of interest – very fast • Serve to web via ArcIMS • Datasets grouped into logical map services • 2000 Census data, agricultural data, elevation data
Storing CUGIR Metadata in GeoDatabase (2) • Metadata Server included in ArcIMS product • Includes Z39.50 service manager • “NSDI Clearinghouse Node in a Box” • Not clear yet whether it will do what we need • Needs to support all elements and make them searchable • We need dynamic export to multiple formats (xml) • Adoption depends on how much spatial localization of metadata is important
Advantages of GeoDatabases to Users (1) • Seamless coverage • Original geographic unit of compilation or distribution no longer relevant • Easily overlay data gathered by quad or watershed with data from county-based sources • Extraction server permits download of dynamically-generated zip files by arbitrary area as well as by attribute • Data may be accessed directly from users’ ESRI desktop mapping applications
Advantages of GeoDatabases to Library (1) • No longer maintaining thousands of data files • No longer maintaining 4x metadata files (text, html, sgml, xml) • Improved management and backup of data via Oracle and ArcSDE tools
Advantages of GeoDatabases to Library (2) • Support for serving raster & vector data • Raster catalogs of aerial photos even if not spatially rectified • Comparable speed to wavelet-compressed file formats • Access GIS data from ordinary Oracle queries • Gazetteer service to link data with maps using place names and geographic features • Supplement EnCompass functionality
Risks of GeoDatabase Approach • Many pieces must working together • Oracle, ArcSDE, ArcIMS, Apache, Tomcat Servlet Engine • Version inconsistencies may inhibit upgrades • Complex to set up • Web mapping is a very different interface requiring lots of client-side programming • ArcSDE and ORACLE require training • Uncertain server load from web mapping
Summary and Goals • Migrating away from a the notion of CUGIR as a geographic data repository • Was a web front end with fixed back-end deliverables • CUGIR 2.0 will be a spatially-enabled service provider • Buckets are persistent digital objects providing access to dynamically updated metadata, data, and maps • CUGIR data may appear “live” in CUL/external solutions • Allow users to create their own digital resources drawing on a mix of CUGIR and externally-served data