1 / 58

NARA Meeting

Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University Libraries. NARA Meeting. Dec. 14, 2005. Outline. Digital Geospatial Data: Types Risks to Digital Geospatial Data

abel
Download Presentation

NARA Meeting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Preservation of Digital Geospatial Data: Challenges and OpportunitiesSteve MorrisHead of Digital Library InitaitivesNorth Carolina State University Libraries NARA Meeting Dec. 14, 2005

  2. Outline • Digital Geospatial Data: Types • Risks to Digital Geospatial Data • Overview of NC Geospatial Data Archiving Project • Preservation Challenges and Possible Solutions Note: Percentages based on the actual number of respondents to each question

  3. Geospatial data types: Vector data Note: Percentages based on the actual number of respondents to each question

  4. Geospatial data types: Satellite imagery Note: Percentages based on the actual number of respondents to each question

  5. Geospatial data types: Aerial imagery Note: Percentages based on the actual number of respondents to each question

  6. Geospatial data types: Aerial imagery Note: Percentages based on the actual number of respondents to each question

  7. Geospatial data types: Aerial imagery Note: Percentages based on the actual number of respondents to each question

  8. Geospatial data types: Tabular data (w/vector) Note: Percentages based on the actual number of respondents to each question

  9. Time series – vector data Parcel Boundary Changes 2001-2004, North Raleigh, NC Note: Percentages based on the actual number of respondents to each question

  10. Time series – Ortho imagery Vicinity of Raleigh-Durham International Airport 1993-2002 Note: Percentages based on the actual number of respondents to each question

  11. Today’s geospatial data as tomorrow’s cultural heritage Note: Percentages based on the actual number of respondents to each question

  12. Risks to Digital Geospatial Data .shp .mif .gml .e00 .dwg .dgn .bsb .bil .sid Note: Percentages based on the actual number of respondents to each question

  13. Risks to Digital Geospatial Data • Producer focus on current data • Time-versioned content generally not archives • Future support of data formats in question • Vast range of data formats in use--complex • Shift to “streaming data” for access • Archives have been a by-product of providing access • Preservation metadata requirements • Descriptive, administrative, technical, DRM • Geodatabases • Complex functionality Note: Percentages based on the actual number of respondents to each question

  14. NC Geospatial Data Archiving Project • Partnership between university library (NCSU) and state agency (NCCGIA) • Focus on state and local geospatial content in North Carolina (statedemonstration) • Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventory information • Objective: engage existing state/federal geospatial data infrastructures in preservation Note: Percentages based on the actual number of respondents to each question

  15. Targeted Content • Resource Types • GIS “vector” (point/line/polygon) data • Digital orthophotography • Digital maps • Tabular data (e.g. assessment data) • Content Producers • Mostly state, local, regional agencies • Some university, not-for-profit, commercial • Selected local federal projects Note: Percentages based on the actual number of respondents to each question

  16. Local Government GIS: Archival Issues • Data resources are highly distributed and subject to frequent update • More detailed, current, accurate than federal/state data resources • North Carolina local agency GIS environment • 100 counties, 95 with GIS • 85 counties with high resolution orthophotography • Growing number of municipal systems • Value: $162 million plus investment (est. in 2003) Note: Percentages based on the actual number of respondents to each question

  17. Work plan in a Nutshell • Work from existing data inventories • NC OneMap Data Sharing Agreements as the “blanket”, individual agreements as the “quilt” • Partnership: work with existing geospatial data infrastructures (state and federal) • Technical approach • METS with FGDC, PREMIS?, GeoDRM? • Dspace now; re-ingest to different environment • Web services consumption for archival development Note: Percentages based on the actual number of respondents to each question

  18. NCGDAP Philosphy of Engagement Provide feedback to producer organizations/ inform state geospatial infrastructure Take the data as in the manner In which it can be obtained “Wrangle” and archive data Note the ‘Project’ in ‘North Carolina Geospatial Data Archiving Project’– the process, the learning experience, and the engagement with geospatial data infrastructures are more important than the archive Note: Percentages based on the actual number of respondents to each question

  19. Big Challenges • Format migration paths • Management of data versions over time • Preservation metadata • Harnessing geospatial web services • Preserving cartographic representation • Keeping content repository-agnostic • Preserving geodatabases • More … Note: Percentages based on the actual number of respondents to each question

  20. Vector Data Format Issues • Vector data much more complicated than image data • ‘Archiving’ vs. ‘Permanent access’ • An ‘open’ pile of XML might make an archive, but if using it requires a team of programmers to do digital archaeology then it does not provide permanent access • Piles of XML need to be widely understood piles • GML: need widely accepted application schemas (like OSMM?) • The Geodatabase conundrum • Export feature classes, and lose topology, annotation, relationships, etc. • … or use the Geodatabase as the primary archival platform (some are now thinking this way) Note: Percentages based on the actual number of respondents to each question

  21. GIS Software Used: NC Local Agencies Note: Percentages based on the actual number of respondents to each question Source: NC OneMap Data Inventory 2004

  22. Vector Data Format Options • Option A: use an open format and have a really unfortunate transformation and limited vendor support for the output object • Option B: use closed format but retain the original content and count on short- and medium-term vendor support.  • Option C: do both to buy time and look for an open, ASCII-based solution. (watch GML activity) No sweet spot, just an evolving and changing mix of flawed options that are used in combination. Note: Percentages based on the actual number of respondents to each question

  23. Geography Markup Language Issues • GML still more useful as a transfer format than an archival format, support limited even for transfer • “Permanent access” requirements: • profiles and application schemas widely understood and supported, avoid requiring “digital archaeology” • role of GML Simple Features Profile? • Assessing formats for preservation: sustainability factors, quality & functionality factors • Apply same approach to GML profiles and application schemas? Note: Percentages based on the actual number of respondents to each question

  24. Geography Markup Language Issues • Plans for environmental scan of existing GML profiles and application schemas or profiles • schema name (e.g. OSMM, top10NL, ESRI GML, LandGML) • responsible agency; schema has official government status? • GML version; known unsupported GML components • schema history; known interoperation with other schemas • vendor support; translator support; stability over time Note: Percentages based on the actual number of respondents to each question

  25. Managing Time-versioned Content Note: Percentages based on the actual number of respondents to each question

  26. Managing Time-versioned Content • Many local agency data layers continuously updated • E.g., some county cadastral data updated daily—older versions not generally available • Individual versioned datasets will wander off from the archive • How do users “get current metadata/DRM/object” from a versioned dataset found “in the wild”? • How do we certify concurrency and agreement between the metadata and the data? Note: Percentages based on the actual number of respondents to each question

  27. Managing Time-versioned Content • Can we manage the relationship loosely using a persistent identifier link to a parent object? Persistent ID Resolver Parent Object Manager version version version version version Note: Percentages based on the actual number of respondents to each question

  28. Preservation Metadata Issues • FGDC Metadata • Many flavors, incoming metadata needs processing • Cross-walk elements to PREMIS, MODS? • Metadata wrapper/Content packaging • METS (Metadata Encoding and Transmission Standard) vs. other industry solutions • Need a geospatial industry solution for the ‘METS-like problem’ • GeoDRM a likely trigger—wrapper to enforce licensing (MPEG 21 references in OGIS Web Services 3) Note: Percentages based on the actual number of respondents to each question

  29. Metadata Availability Note: Percentages based on the actual number of respondents to each question

  30. Harnessing Geospatial Web Services Note: Percentages based on the actual number of respondents to each question

  31. Note: Percentages based on the actual number of respondents to each question

  32. Note: Percentages based on the actual number of respondents to each question

  33. Note: Percentages based on the actual number of respondents to each question

  34. Note: Percentages based on the actual number of respondents to each question

  35. Note: Percentages based on the actual number of respondents to each question

  36. Geospatial Web Service Types • Image services • Deliver image resulting from query against underlying data • Limited opportunity for analysis • Feature services • Stream actual feature data, greater opportunity for data analysis • Other • Geocoding services • Routing • .etc. Note: Percentages based on the actual number of respondents to each question

  37. Note: Percentages based on the actual number of respondents to each question

  38. Geospatial Web Services Rights IssuesExample: Desktop GIS-accessible ArcIMS • 39 of 100 NC counties have desktop GIS-accessible ArcIMS services • It is difficult to know how many of these counties actually expect users to either: • A) access data through desktop GIS for viewing only, or • B) extract and download data Note: Percentages based on the actual number of respondents to each question

  39. Harnessing Geospatial Web Services • Automated content identification • ‘capabilities files,’ registries, catalog services • WMS (Web Map Service) for batch extraction of image atlases • last ditch capture option • preserve cartographic representation • retain records of decision-making process • … feature services (WFS) later. • Rights issues in the web services space are ambiguous Note: Percentages based on the actual number of respondents to each question

  40. “Web mash-ups” and the New Mainstream Geospatial Web Services Note: Percentages based on the actual number of respondents to each question

  41. Preserving Cartographic Representation Note: Percentages based on the actual number of respondents to each question

  42. Preserving Cartographic Representation • The true counterpart of the old map is not the GIS dataset, but rather the cartographic representation that builds on that data: • Intellectual choices about symbolization, layer combinations • Data models, analysis, annotations • Cartographic representation typically encoded in proprietary files (.avl, .lyr, .apr, .mxd) that do not lend themselves well to migration • Symbologies have meaning to particular communities at particular points in time, preserving information about symbol sets and their meaning is a different problem Note: Percentages based on the actual number of respondents to each question

  43. Preserving Cartographic Representation • Image-based approaches • Generate images using Map Book or similar tools • Harvest existing atlas images • Capture atlases from WMS servers • Export ‘layouts’ or ‘maps’ to image • Vector-based approaches • Store explicitly in the data format (e.g. Feature Class Representation in ArcGIS 9.2) • Archive and upward-migrate existing files .avl, .apr, .lyr, .mxd, etc. • SVG, VML or other XML approaches • Other? Note: Percentages based on the actual number of respondents to each question

  44. Preserving Cartographic Representation Note: Percentages based on the actual number of respondents to each question

  45. Preserving Cartographic Representation Note: Percentages based on the actual number of respondents to each question

  46. Repository Architecture Issues • Interest in how geospatial content interacts with widely available digital repository software • Focus on salient, domain-specific issues • Challenge: remain repository agnostic • Avoid “imprinting” on repository software environment • Preservation package should not be the same as the ingest object of the first environment • Tension between exploiting repository software features vs. becoming software dependent Note: Percentages based on the actual number of respondents to each question

  47. Preserving Geodatabases • Spatial databases in general vs. ESRI Geodatabase “format” • Not just data layers and attributes—also topology, annotation, relationships, behaviors • ESRI Geodatabase archival issues • XML Export, Geodatabase History, File Geodatabase, Geodatabase Replication • Some looking to Geodatabase as archival platform (in addition to feature class export) Note: Percentages based on the actual number of respondents to each question

  48. Geodatabase Availability • Local agencies, especially municipalities, are increasingly turning to the ESRI Geodatabase format to manage geospatial data. • According to the 2003 Local Government GIS Data Inventory, 10.0% of all county framework data and 32.7% of all municipal framework data were managed in that format. Note: Percentages based on the actual number of respondents to each question

  49. Evolving Geodatabase Handling Approaches Note: Percentages based on the actual number of respondents to each question

  50. Efficient Content Replication • Content replication also needed for: • Disaster preparedness • State and federal data improvement projects • Aggregation by regional geospatial web service providers • WFS, e.g.: efficiency in complete content transfer? • Rsync-like function, plus: rights management, inventory processes, metadata management, informed by data update cycles • Archiving delta files vs. complete replication – need to avoid requiring “digital archaeology” in the future Note: Percentages based on the actual number of respondents to each question

More Related