290 likes | 300 Views
Explore geospatial data preservation challenges at the state level using the North Carolina experience. Learn about risks, value in older data, and potential solutions for sustainable data archiving.
E N D
Geospatial Data Preservation Challenges at the Sub-National Level:The North Carolina ExperienceSteve MorrisHead of Digital Library InitiativesNorth Carolina State University Libraries Cambridge Conference July 18, 2007
Outline • Project background • Targeted geospatial content • Risks to data • Value in older data • Challenges (Technical and organizational) • Solutions (?) • Next steps
NC Geospatial Data Archiving Project • Partnership between university library (NCSU) and NC Center for Geographic Information & Analysis • Part of the Library of Congress National Digital Information Infrastructure and Preservation Program (NDIIPP) • Focus on state and local geospatial content in North Carolina (statedemonstration) • Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventories • Objective: engage existing state/federal geospatial data infrastructures in preservation Serve as catalyst for discussion within industry
NCGDAP Goals • Repository Goal • Capture at-risk data • Explore technical and organizational challenges • Project End Goal • Data Producers: Improved temporal data management practices • Archives: More efficient means of acquiring and preserving data; Progress towards best practices Temporal data management vs. long-term preservation
Collection Focus: State and Local Government Geospatial Data • 96 of 100 North Carolina Counties have GIS systems as do many municipalities • Over 30 state agency data producers • Exceptional value • Detailed, current, accurate • Exceptional risk • Inconsistent or nonexistent archiving practices • Complicated formats and complex objects Source: NC OneMap
Carrboro, NC : Population 17,797 (2005 est.) 22 downloadable GIS data layers 10 web mapping applications 3 OGC WMS services (web services) 9 downloadable PDF map layers
NCGDAP Data Types – Vector GIS • County, municipal, state • Detailed, accurate, current • Frequently updated • Cadastral (tax parcels) • Street centerlines • Zoning • Topographic contours • School, sheriff, fire • Voting precincts • More …
NCGDAP Data Types – Digital Orthophotography • All 100 NC counties with orthos • 1-5 flight years per county • 30-300 gb per flight
NCGDAP Data Types – Cartographic • GIS Software • Software project file (.mxd, .apr, …) • Data layer file (.avl, .lyr, …) • PDF map exports • Web Services-based representations Note: Percentages based on the actual number of respondents to each question
Other Data Types – Place-based Data Oblique Imagery • Mobile, LBS, and, social networking applications • Long-term cultural heritage value in non-overhead imagery: more descriptive of place and function Street View Images Tax Dept. Photos Road Videologs Note: Percentages based on the actual number of respondents to each question
Digital Preservation Points of Failure • Data is not saved, or … • can’t be found, or … • media is obsolete, or … • media is corrupt, or … • format is obsolete, or … • file is corrupt, or … • meaning is lost Solutions: Migration Emulation Encapsulation XML
Risks to Geospatial Data • Producer focus on current data • Data overwrite as common practice • Future support of data formats in question • No open, supported format for vector data • Shift to web services-based access • Data becoming more ephemeral • Inadequate or nonexistent metadata • Impedes discovery and use • Increasing use of spatial databases for data management • The whole is greater than the sum of the parts
Value in Older Data: Solving Business Problems Land use change analysis Site location analysis Real estate trends analysis Disaster response Resolution of legal challenges Impervious surface maps Suburban Development 1993/2002 Near Mecklenburg-Cabarrus County border
Value in Older Data: Cultural Heritage Future uses of data are difficult to anticipate (as with Sanborn Maps)
Challenge: Vector Data Formats • No widely-supported, open vector formats for geospatial data • Spatial Data Transfer Standard (SDTS) not widely supported • Geography Markup Language (GML) – diversity of application schemas and profiles a challenge for “permanent access” • Spatial Databases • The whole is more than the sum of the parts, and the whole is very difficult to preserve • Can export individual data layers for curation, but relationships and context are lost • Some thinking of using the spatial database as the primary archival platform
Challenge: Cartographic Representation Counterpart to the map is not just the dataset but also models, symbolization, classification, annotation, etc.
Challenge: Geospatial Web Services • How to capture records from decision- • making processes? • Possible: Atlas collections from automated • image capture • Web 2.0 impact: Emerging tiling and • caching schemes (archive target?)
Challenge: Preservation Metadata Results from a 2006 survey of all 100 NC counties and 25 largest NC municipalities
Challenge: Data Capture 2006 Frequency of Capture Survey targeting North Carolina counties and municipalities Response: yes = 65.3%, no = 34.7%* (out of 57.6% response rate)
Data Capture Survey Results: Overview • Two-thirds of responding agencies create and retain periodic snapshots • Long-term retention more common in counties with larger populations • Storage environments vary, with servers and CD-ROMs most common • Offsite storage (or both onsite and offsite) is used by nearly half of the respondents • Popularity of historic images has resulted in scanning and geo-referencing of hardcopy aerial photos among one-third of the respondents
Solutions: Content Exchange Infrastructure • Volume of state/federal requests for local data (“contact fatigue”) spurs rethinking of archive strategy for data acquisition • Leveraging more compelling business reasons to put the data in motion (disaster preparedness, highway construction, census, …) • Content exchange networks: • Minimize need to make contact • Add technical, administrative, descriptive metadata • Establish rights and provenance
Informing and Leveraging Other Infrastructure • NC GIS Inventory • Efficient data identification • Adding preservation elements Orthophoto Data Distribution System Efficient transfer of large quantities of imagery • NC OneMap Data Download and Viewer • Public access • Data visualization Street Centerline Data Distribution System Efficient transfer of data from 100 counties, with metadata and clarified rights
Solutions: Engaging Standards Efforts • Partnered with EDINA (UK) and NARA to approach the Open Geospatial Consortium (OGC) in 2005-2006 • Working Group charter approved by OGC Technical Committee plenary Dec. 2006
Points of Engagement with the OGC • GML for archiving • Geo Rights Management – adding archive use cases • Content packaging • Saving data state in web services Interactions • Content replication (OGC/Open Grid Forum talks) • Persistent identifiers • Data versioning (metadata and catalog support) • Cartographic representation Cross-fertilize between library/archives and geospatial communities
Role of Commercial Data Providers Project Status Cultivating a commercial market for older data. Part of “permanent access” is marketing, advertising, and putting older data into the path of the user
Signs of Hope • Software vendors are more keenly aware of temporal data management as a customer problem • Consulting firms increasingly see temporal data management and archiving as a business opportunity • Innovative practices emerging at local and state level to complement and inform national level activities Viral adoption of archiving practices vs. mandated archiving practices: which will have more effect?
Next Steps • Technical • Refining repository ingest workflow (currently using DSpace) • Further investigation into use of METS (Metadata Encoding and Transmission Standard) and PREMIS (Preservation Metadata Standard) • Content exchange tests with other organizations • Organizational • OGC Data Preservation Working Group • Engaging State Archives: Local records outreach and records retention practices • Work towards formulating best practices for data capture practices for local agencies • Content exchange networks
Questions? Steve Morris Head, Digital Library Initiatives NCSU Libraries ph: (919) 515-1361 Steven_Morris@ncsu.edu http://www.lib.ncsu.edu/ncgdap
Note: Percentages based on the actual number of respondents to each question