400 likes | 487 Views
Putting time into the GeoWeb:. Data persistence in a web services environment Steve Morris NCSU Libraries. July 23, 2008. Overview. Background to the digital preservation problem Problems Temporal data access issues Capturing data state in a services or API context
E N D
Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008
Overview • Background to the digital preservation problem • Problems • Temporal data access issues • Capturing data state in a services or API context • Making the business case for older data • Preservation approaches • Future directions
Project background: North Carolina Geospatial Data Archiving Project • Partnership between university library (NCSU) and state agency (NCCGIA) • Under cooperative agreement with Library of Congress in NDIIPP national preservation program • Focus on state and local geospatial content in North Carolina (statedemonstration) • Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventories • Goal: Engage spatial data infrastructure (SDI) in data preservation and archiving Demonstration repository as catalyst for an industry conversation
SDI role in data preservation • Data inventories support content identification • Metadata standards support discoverability and use • Content standards support data interoperability over time and help eliminate semantic confusion • Data exchange networks: • Minimize need to make contact • Add technical, administrative, descriptive metadata • Establish rights and provenance
Project roots: NCSU Libraries data directories Tracking data, map servers, and web services since 2000 Ranked 3rd in traffic among entry points to entire library website Persistent identifiers • usage tracking • ID links used in other sites Community help in site maintenance
County map and data services in NC 100 Counties in North Carolina
Carrboro, NC : Population 17,797 (2005 est.) 24 downloadable GIS data layers 6 web mapping applications 4 WMS data layers 9 downloadable PDF map layers
Downtown Raleigh Near State Capitol 1914 Sanborn Map Note: Percentages based on the actual number of respondents to each question
Downtown Raleigh Near State Capitol 1993 DOQQ Note: Percentages based on the actual number of respondents to each question
Downtown Raleigh Near State Capitol 1999 Wake County Ortho Note: Percentages based on the actual number of respondents to each question
Downtown Raleigh Near State Capitol 2005 Wake County Ortho Note: Percentages based on the actual number of respondents to each question
Imagery = Durable Static Simple structure Mostly open formats Vector data = Volatile Frequent update Complex structure Mostly proprietary formats Imagery = Durable Static Simple structure Mostly open formats Vector data = Volatile Frequent update Complex structure Mostly commercial formats Downtown Raleigh Near State Capitol 2005 Wake County Ortho Downtown Raleigh Near State Capitol 2005 Wake County Ortho Note: Percentages based on the actual number of respondents to each question
Data preservation points of failure • Data is not saved, or … • can’t be found, or … • media is obsolete, or … • media is corrupt, or … • format is obsolete, or … • file is corrupt, or … • meaning is lost Solutions: Migration Emulation Encapsulation XML
How to capture records from decision- making processes? • How to capture data state as well as service state? Problem: Data state in a web services or API-driven environment • xxxxxxxxxxxxxxxxxx
Problem: Temporal data unavailability • Industry focus on “latest and greatest” data • “Kill and fill” as a common approach to data management (past versions of vector data lost) Not just data loss, also: Loss of memory about data • Of superceded county orthophoto flights in NC only 22% recorded in the state’s GIS inventory Some older inventories only available through Internet Archive
Availability of older orthoimagery on county map servers in NC • Only 30% of superceded digital ortho flights • accessible through county map servers
Availability of older orthoimagery on county map servers in NC • 23 Counties in NC publish ortho WMS services • 0 Counties in NC publish superceded orthos as WMS services
Problem: Making business case for archiving 1998 1999 1993 2002 2005 Use case: Land use and impervious surface change analysis
Building the preservation business case • Land use change analysis • Site location analysis • Real estate trends analysis • Disaster response • Resolution of legal challenges • Impervious surface change mapping
Planned 2008 NC business case survey • Case description • Resources/Scope of effort • Benefits and results • Fiscal assessment Based on previous experience, pending projects, examples of when a project could have been served better if archival data were available
Geospatial data preservation challenges • Producer focus on current data • Future support of data formats in question • Inadequate or nonexistent metadata • Spatial databases • Complex data objects (multi-file, multi-format) • Shift to web services-based access (ephemeral data) • Difficult to capture data state at point of decision-making
Preservation approaches: Temporal data snapshots Issue: How frequently should county and municipal vector data layers be captured in archives? Parcels, centerlines, jurisdictions, zoning, … Parcel Boundary Changes 2001-2004, North Raleigh, NC
NC frequency of data capture surveys • How often should continually changing vector datasets be captured? • Tap into data custodian understanding of production patterns and uses • Tap into local innovation • Learn about local business drivers for data archiving • 2006 and 2008 surveys of NC cities and counties • 2008 survey of archival practice in state agencies in NC • Planned survey of data users in NC http://www.nconemap.com/AboutNCOneMap/tabid/289/Default.aspx#preservation
Preservation approaches: Dessicated data Complex data representations can be made more preservable (and less useful) through simplification
Preservation approaches: Dessicated data • Complex documents may be very hard to preserve over time • GIS project files • Layer definitions • Web services or API interactions • Image outputs capture some sense of final product--but lose underlying data intelligence
Dessicated data: PDF and GeoPDF • Cartographic outputs – analogous to the old paper maps • Combined datasets, with data models, classification, symbolization, annotation • More data intelligence than in images Note: Percentages based on the actual number of respondents to each question
Dessicated data: Geospatial PDF • Explosion of geospatial PDF content in past few years • Standards issues • GeoPDF: proprietary TerraGo technology • PDF an open ISO standard • Open PDF variants created through ISO standards process (PDF/E, PDF/X, PDF/A, …) • PDF content retained in addition to, NOT instead of data
Preservation approaches: Historical WMS tile caches? No market for archived tiles without standard way to describe tiles and without commonly used tiling schemes
Preservation approaches:Historical WMS tile caches? • Tile cache systems developed for more responsive WMS or mapping systems • WMS Tile Caching (WMS-C) incubated by OSGEO • WMTS (Web Map Tiling) OGC white paper • No explicit temporal component in WMS-C or WMT To what extent do temporal geospatial systems become video-like?
Old maps coming into the GeoWeb … Pronounced local agency interest in archiving, digitizing, and geo-referencing older analog products • Use Sanborn map slide or replacement
New archiving interest: Location-based content Oblique Imagery Street Views 3D Images • Present-day value in location-based services and mobile applications
New archiving interest: Location-based content • Future value of non-spatial place-based imagery as cultural heritage resource • More descriptive of place and function than spatial imagery
Moving forward • GICC Archival and Long-Term Access Committee • Geo Multistate Archival and Preservation Partnership (GeoMAPP) • OGC Data Preservation Working Group
Community response to data archiving challenge • Nov. 2007: NC Geographic Information Coordinating Council (GICC): Ten Recommendations in Support of Geospatial Data Sharing released • Recommendation: “Establish archive and long term data access strategies” • Suggested best practices include: “Establish a policy and procedure for the provision of access to historic data, especially for framework data layers.”
GICC Archival and Long-Term Access Committee • Initiated in response to agency requests for guidance on temporal data management • Federal, state, regional, and local agency representation • Key focus • Best practices for data snapshots and retention • State Archives processes: appraisal, selection, retention schedules, etc. • Who, What, Why, When, Where, How
Geo Multistate Archive and Preservation Partnership (GeoMAPP) • Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA), State Archives of NC, with Library of Congress • Partners: • State geospatial organizations of Kentucky and Utah • State Archives of Kentucky and Utah • NCSU Libraries in catalytic/advisory role • State-to-state and geo-to-Archives collaboration • 2 year project: Nov. 2007-Dec. 2009 • Archives as part of Spatial Data Infrastructure
OGC Data Preservation Working Group • Formed Dec. 2006 • Engage archival community • Find points of intersection with other OGC activities: • GML for archiving • Content packaging • Large scale data transfers • Time in decision support
The Content Packaging Problem • Files • Multi-file dataset • Georeferencing • Metadata file • Symbols file • Additional • documentation • License • Disclaimer • More • Metadata • ISO/FGDC • Acquisition metadata • Transfer metadata • Ingest metadata • Archive rights • Archive processes • Collection metadata • Series metadata Metadata Exchange Format (MEF) in GeoNetwork a form of content packaging
Questions? Contact: Steve Morris Head, Digital Library Initiatives NCSU Libraries Steven_Morris@ncsu.edu NCGDAP site: http://www.lib.ncsu.edu/ncgdap/