940 likes | 955 Views
Explore the risks, values, and preservation challenges surrounding digital geospatial data. Learn about key data types, preservation solutions like migration and encapsulation, and the importance of maintaining older geospatial data for future use.
E N D
Preservation Issues Related to Digital Geospatial DataSteven P. MorrisHead of Digital Library InitiativesNorth Carolina State University Libraries Library of Congress Workshop April 21, 2008
Revisiting Key Geospatial Data Types Risks to Digital Geospatial Data Value in Temporal/Historical Data Archiving Challenges Overview of the Problem Area: Outline Note: Percentages based on the actual number of respondents to each question
Brief (Very) Overview of the Geospatial Domain Note: Percentages based on the actual number of respondents to each question
Data Types – Digital Orthophotography • All 100 NC counties with orthos • 1-5 flight years per county • 30-300 gb per flight
Geospatial Data Types – Vector GIS • County, municipal, state • Detailed, accurate, current • Frequently updated • Cadastral (tax parcels) • Street centerlines • Zoning • Topographic contours • School, sheriff, fire • Voting precincts • More …
Data Types – Spatial Databases • Vector and raster data • Relationships • Behaviors • Annotation • Data Models
Geospatial Data Types – Cartographic • GIS Software • Software project file (.mxd, .apr, …) • Data layer file (.avl, .lyr, …) • PDF map exports • Web Services-based representations Note: Percentages based on the actual number of respondents to each question
Other Geospatial Data Types – Place-based Data Oblique Imagery • Mobile, LBS, and, social networking applications • Long-term cultural heritage value in non-overhead imagery: more descriptive of place and function Street View Images Tax Dept. Photos Road Videologs Note: Percentages based on the actual number of respondents to each question
Geospatial Data: Compelling Issues • Dynamic content • Constantly updated information • Data versioning • Digital object complexity • Spatially enabled databases • Complicated, multi-component formats • Proprietary formats Note: Percentages based on the actual number of respondents to each question
Risks to Geospatial Data Note: Percentages based on the actual number of respondents to each question
Bob’s hard drive Last week’s set of nightly tape backups Several boxes of CD’s and DVD’s The data back-end for our internet mapping application A collection of files in our “GIS Folder” A stand-alone spatial database An enterprise GIS How would you describe your current geospatial archive?
Digital Preservation Points of Failure • Data is not saved, or … • can’t be found, or … • media is obsolete, or … • media is corrupt, or … • format is obsolete, or … • file is corrupt, or … • meaning is lost Solutions: Migration Emulation Encapsulation XML
Risks to Geospatial Data • Producer focus on current data • Data overwrite as common practice • Future support of data formats in question • No open, supported format for vector data • Shift to web services-based access • Data becoming more ephemeral • Inadequate or nonexistent metadata • Impedes discovery and use • Increasing use of spatial databases for data management • The whole is greater than the sum of the parts
Value in Older Geospatial Data Note: Percentages based on the actual number of respondents to each question
Value in Older Data: Cultural Heritage Future uses of data are difficult to anticipate (as with Sanborn Maps)
Value in Older Data: Solving Business Problems Land use change analysis Site location analysis Real estate trends analysis Disaster response Resolution of legal challenges Impervious surface maps Suburban Development 1993/2002 Near Mecklenburg-Cabarrus County border
Application: Impervious Surface Change Mapping A. B. 2004 Aerial Photography 2002 Impervious D. C. 2004 Impervious Update 2004 Impervious using 2002 Mask
Developing Areas Application: Land Use Change Mapping Output GIS Data Input Data Using Mecklenburg County 2002 true color orthorectified aerial photography
Preservation Challenges Note: Percentages based on the actual number of respondents to each question
Challenge: Vector Data Formats • No widely-supported, open vector formats for geospatial data • Spatial Data Transfer Standard (SDTS) not widely supported • Geography Markup Language (GML) – diversity of application schemas and profiles a challenge for “permanent access” • Spatial Databases • The whole is more than the sum of the parts, and the whole is very difficult to preserve • Can export individual data layers for curation, but relationships and context are lost • Some thinking of using the spatial database as the primary archival platform
Challenge: Preserving Geodatabases • Spatial databases in general vs. ESRI Geodatabase “format” • Not just data layers and attributes—also topology, annotation, relationships, behaviors • ESRI Geodatabase archival issues • XML Export, Geodatabase History, File Geodatabase, Geodatabase Replication • Some looking to Geodatabase as archival platform (in addition to feature class export) Note: Percentages based on the actual number of respondents to each question
Challenge: Cartographic Representation Counterpart to the map is not just the dataset but also models, symbolization, classification, annotation, etc.
Challenge: Geospatial Web Services • How to capture records from decision- • making processes?
Challenge: Preservation Metadata Results from a 2006 survey of all 100 NC counties and 25 largest NC municipalities
Challenge: Data Capture 2006 Frequency of Capture Survey targeting North Carolina counties and municipalities Response: yes = 65.3%, no = 34.7%* (out of 57.6% response rate)
Challenge: Digital Object Complexity Note: Percentages based on the actual number of respondents to each question
Building Data Bundles: The Zip Codes Example Note: Percentages based on the actual number of respondents to each question
Where is the Dataset? Note: Percentages based on the actual number of respondents to each question
Here’s One! • Files • Multi-file dataset • Georeferencing • Metadata file • Symbolization file • Additional • documentation • License • Disclaimer • More • Metadata • FGDC • Acquisition metadata • Transfer metadata • Ingest metadata • Archive rights • Archive processes • Collection metadata • Series metadata Note: Percentages based on the actual number of respondents to each question
Other Challenges • Rights management • Data versioning • Semantic issues • Large scale content transfer • Integrating older analog data • More … Note: Percentages based on the actual number of respondents to each question
Approaches to Archiving and Preservation Current and Recent Geoarchiving Projects Content Identification Content Selection Content Exchange Digital Repository Development Engaging Spatial Data Infrastructure Archives Processes Looking for Solutions: Outline Note: Percentages based on the actual number of respondents to each question
Different Ways to Approach Preservation • Technical solutions: How do we preserve acquired content over the long term? • Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be preserved—from point of production? Current use and data sharing requirements – not archiving needs – are most likely to drive improved preservability of content and improvement of metadata Note: Percentages based on the actual number of respondents to each question
Different Ways to Approach Preservation • Technical solutions: How do we archive acquired content over the long term? • Build data repositories: not just as an end in itself but also as a catalyst for discussion within the data community • Develop repository ingest workflows: create technical points of engagement with other NDIIPP preservation projects and build on collective learning experience Note: Percentages based on the actual number of respondents to each question
Different Ways to Approach Preservation • Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be archived—from point of production? • Engage data producer community and spatial data infrastructure through outreach and engagement; influence practice • Sell the problem to software vendors and standards development • Find overlap with more compelling business problems: disaster preparedness, business continuity, road building, etc. • Start a discussion about roles at the local, state, and federal level Note: Percentages based on the actual number of respondents to each question
Current or Recent Geospatial Data Archiving Projects Note: Percentages based on the actual number of respondents to each question
Selected Geospatial Data Archive Projects Note: Percentages based on the actual number of respondents to each question
NC Geospatial Data Archiving Project • Partnership between university library (NCSU) and state agency (NCCGIA), with Library of Congress under the National Digital Information Infrastructure and Preservation Program (NDIIPP) • One of 8 initial NDIIPP collection building partnerships • Focus on state and local geospatial content in North Carolina (statedemonstration) • Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventories • Objective: engage existing state/federal geospatial data infrastructures in preservation Serve as catalyst for discussion within industry Note: Percentages based on the actual number of respondents to each question
NCGDAP Goals • Repository Goal • Capture at-risk data • Explore technical and organizational challenges • Project End Goal • Data Producers: Improved temporal data management practices • Archives: More efficient means of acquiring and preserving data; Progress towards best practices Temporal data management vs. long-term preservation Note: Percentages based on the actual number of respondents to each question
Content Identification Note: Percentages based on the actual number of respondents to each question
Formal Inventory Processes • Alleviate “contact fatigue” on part of local agencies • 20 different NC state agencies contact local agencies for data … also, federal/regional agencies • Geospatial data is complex, requiring lengthy inventory process • Must capture descriptive, technical, and administrative information related to the data • Make the inventory available as a sharable data store Note: Percentages based on the actual number of respondents to each question
What do Inventories Offer to Archives? • Data Availability Information • Detailed information by data layer • Contact Information • Minimal Metadata • Descriptive, technical, administrative • Rights Information • Document Technical Environment • Software used, formats, transfer methods • Future Data Development Plans Note: Percentages based on the actual number of respondents to each question
Detailed Information About Data Note: Percentages based on the actual number of respondents to each question Source: NC OneMap Data Inventory 2004
Inventories as Source of MetadataExample: Surface Water Note: Percentages based on the actual number of respondents to each question
Content Selection Note: Percentages based on the actual number of respondents to each question
Selection Issues • Most content is already at some level of risk • Early-Middle-Late Stage issues • Middle stage is usually the “sweet spot”, e.g. TIFF orthophotos vs. raw images or compressed images • Also added-value products: digital maps, cartographic representation • Digital maps: “record” or not? • Frequency of capture Note: Percentages based on the actual number of respondents to each question