320 likes | 333 Views
This workshop highlights the North Carolina Geospatial Data Archiving Project and its goals to capture at-risk data, explore challenges, and improve temporal data management practices. The project focuses on state and local government geospatial data in North Carolina.
E N D
Preserving State and Local Government Digital Geospatial Data North Carolina Partnerships Workshop on Archiving Digital Cartography and Geoinformation December 4, 2008 Steve Morris NCSU Libraries
NC Geospatial Data Archiving Project • One of eight initial collection building projects in the Library of Congress-funded NDIIPP (National Digital Information Infrastructure and Preservation Program) • Lead organizations: North Carolina State University Libraries and North Carolina Center for Geographic Information & Analysis (NCCGIA) • Focus: • State and local government geospatial data in NC • Repository development as catalyst for discussion • Goal: Engage SDI in data archiving and preservation • Initial 3 year project extended to March 2009
NCGDAP Project Goals • Repository Goal • Capture at-risk data • Explore technical and organizational challenges • Project End Goal • Data Producers: Improved temporal data management practices • Archives: More efficient means of acquiring and preserving data; Progress towards best practices Temporal data management vs. long-term preservation
Spatial Data Infrastructure Role in Archiving • Metadata standards and outreach • metadata quality, best practices • Inventories • Reduce “contact fatigue”, shareable info store • Content exchange networks • Leverage more compelling business reasons to put data in motion • Automate process, add technical & administrative metadata • Framework data communities • Snapshot frequency, schemas, format strategies
NCGDAP Data Types – Digital Orthophotography • All 100 NC counties with orthos • 1-5 flight years per county • 200-300 gb per flight
NCGDAP Data Types – Vector Data • Point, line, and polygon • Attached attribute data • Some layers frequently updated
NCGDAP Data Types – Vector Data • Cadastral (tax parcels) • Street centerlines • Zoning • Topographic contours • Public utilities • School, sheriff, fire • Voting precincts • More … Frequent Update More detailed, current, and accurate than state/federal data sources
Imagery = Durable Static Simple structure Mostly open formats Vector data = Volatile Frequent update Complex structure Mostly proprietary formats Imagery = Durable Static Simple structure Mostly open formats Vector data = Volatile Frequent update Complex structure Mostly proprietary formats Downtown Raleigh Near State Capitol 2005 Wake County Ortho Downtown Raleigh Near State Capitol 2005 Wake County Ortho
Carrboro, NC : Population 17,797 (2005 est.) 24 downloadable GIS data layers 6 web mapping applications 4 OGC WMS services (web services) 9 downloadable PDF map layers
Value in Older Data: Cultural Heritage Future uses of data are difficult to anticipate (as with Sanborn Maps)
Value in Older Data: Solving Business Problems Land use change analysis Site location analysis Real estate trends analysis Disaster response Resolution of legal challenges Impervious surface maps Suburban Development 1993/2002 Near Mecklenburg County-Cabarrus County NC border
Problem: Lack of Temporal Data • Industry focus on “latest and greatest” data • Industry temporally-impaired from the point of view of data availability, software support, etc. • “Kill and fill” as a common approach to data management (past versions of vector data lost) Loss of memory about the data • Of superceded county orthophoto flights in NC: • Only 22% recorded in the state’s GIS inventory • Only 30% accessible through county map servers Some older inventories only available through Internet Archive
Temporal Challenges with Geospatial Data • Complex vector formats: multi-file, multi-format • No non-proprietary, well-supported format for vector data • Shift to web services-based access • Data becoming more ephemeral • Often: Inadequate or nonexistent metadata • Impedes discovery and use • Increasing use of spatial databases for data management • The whole is greater than the sum of the parts but the whole is very hard to preserve • Content packaging • No geospatial industry standard
Problem: Putting the Data in Motion • Most costly part of archive development is identifying, negotiating acquisition, and then transferring data • Local agency “contact fatigue” resulting from repeated state, federal, and university requests for data • Archive development is low priority – leverage other business uses that can put the data in motion • Continuity of operations • Highway planning • Floodplain mapping Objective • Minimize direct contacts • Document data • Clarify rights • Routinize transfer
Problem: Metadata Metadata is often asynchronous, inconsistently structured, incomplete, or missing. Survey of current archiving practice among NC counties and municipalities
Problem: Content Packaging • Complex multi-file, multi-format objects • Shared ancillary components • Need to add administrative & technical metadata beyond geospatial metadata Metadata Exchange Format (MEF) in GeoNetwork is a form of content packaging
Different Ways to Approach Preservation • Technical solutions: How do we preserve acquired content over the long term? • Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be preserved—from point of production? Current use and data sharing requirements – not archiving needs – are most likely to drive improved preservability of content and improvement of metadata
Preservation Approaches: Temporal Data Snapshots Issue: How frequently should county and municipal vector data layers be captured in archives? Parcels, centerlines, jurisdictions, zoning, … Parcel Boundary Changes 2001-2004, North Raleigh, NC
NC Frequency of Data Capture Surveys • How often should continually changing vector datasets be captured? • Tap into data custodian understanding of production patterns and uses • Tap into local innovation • Learn about local business drivers for data archiving • 2006 and 2008 surveys of NC cities and counties • 2008 survey of archival practice in state agencies in NC • Planned survey of data users in NC http://www.nconemap.com/AboutNCOneMap/tabid/289/Default.aspx#preservation
Preservation Approaches: Original Data vs. “Desiccated” Data Complex data representations can be made more preservable (and less useful) through simplification
Capturing Complex and Ephemeral Data Representations • Complex documents may be very hard to preserve over time • GIS project files • Layer definitions • Web services or API interactions • Image outputs capture some sense of final product--but lose underlying data intelligence • GeoMAPP Multistate project: Engagement with ESRI on complex project archiving issues
Desiccated data: PDF and GeoPDF Counterpart to analog map = datasets plus data models, symbolization, classification, annotation, etc. More data intelligence survives in PDF documents than survives in most other “desiccated” formats
Geospatial PDF Trends • Explosion of geospatial PDF content recently • Standards issues • GeoPDF: formerly proprietary TerraGo technology now going through OGC standards process • PDF an open ISO standard • Open PDF variants created through ISO standards process (PDF/E, PDF/X, PDF/A, …) • NCGDAP approach: PDF content retained in addition to, NOT instead of original data
Changes in the Domain: New Location-Based Content Oblique Imagery Street Views 3D Images • Present-day value in location-based services and mobile applications
Changes in the Domain: New Location-Based Content Ortho image • Future value as cultural heritage resource • More descriptive of place and function than spatial data
Moving Forward • GICC Archival and Long-Term Access Committee • Geo Multistate Archival and Preservation Partnership (GeoMAPP) • OGC Data Preservation Working Group
Community Response to the Data Archiving Challenge • Nov. 2007: NC Geographic Information Coordinating Council (GICC): Ten Recommendations in Support of Geospatial Data Sharing released • Recommendation: “Establish archive and long term data access strategies” • Suggested best practices include: “Establish a policy and procedure for the provision of access to historic data, especially for framework data layers.”
NC GICC Archival and Long-Term Access Committee • Initiated Feb. 2008 in response to agency requests for guidance on temporal data management • Federal, state, regional, and local agency representation • Key focus • Best practices for data snapshots and retention • State Archives processes: appraisal, selection, retention schedules, etc. • Who, What, Why, When, Where, How • Final Report delivered to GICC in November 2008
GeoMAPP: Geospatial Multistate Archival and Preservation Partnership • Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA), State Archives of NC, with Library of Congress • Partners: • State geospatial organizations of Kentucky and Utah • State Archives of Kentucky and Utah • NCSU Libraries in catalytic/advisory role • State-to-state and geo-to-Archives collaboration • 2 year project: Nov. 2007-Dec. 2009 • Archives as part of Spatial Data Infrastructure
OGC Data Preservation Working Group • Formed Dec. 2006 • Engage archival community • Find points of intersection with other OGC activities: • GML for archiving • Content packaging • Large scale data transfers • Time in decision support
Conclusions • “Supporting temporal analysis requirements” gets more attention than “archiving and preservation” • Leverage existing infrastructure • Current data sharing needs drive infrastructure improvements that help archiving • Leverage business needs that are more compelling than preservation (e.g., continuity of operations) • Facilitate stakeholder ownership of the solutions • Mine state and local archiving innovations
Thank You! Contact: Steve Morris Head, Digital Library Initiatives North Carolina State University Libraries Steven_Morris @ncsu.edu NCGDAP: http://www.lib.ncsu.edu/ncgdap GeoMAPP http://www.geomapp.com