370 likes | 385 Views
Cooperative Project with Library of Congress on Preservation of Digital Geospatial Data Steve Morris Head of Digital Library Initiatives NCSU Libraries. NC Geospatial Data Archiving Project (NCGDAP). Partnership between NCSU Libraries and NC Center for Geographic Information & Analysis
E N D
Cooperative Project with Library of Congress on Preservation of Digital Geospatial DataSteve MorrisHead of Digital Library InitiativesNCSU Libraries
NC Geospatial Data Archiving Project(NCGDAP) • Partnership between NCSU Libraries and NC Center for Geographic Information & Analysis • $520,000 funding – 3 years • Focus on state and local geospatial content in North Carolina (statedemonstration) • Address NC OneMap objective: “Historic and temporal data will be maintained and available.” • One of eight projects in the first NDIIPP funding round: “Building a Network of Partners” Note: Percentages based on the actual number of respondents to each question
Note: Percentages based on the actual number of respondents to each question
NDIIPP Overview • National Digital Information Infrastructure and Preservation Program • Congress appropriated $100 million for this effort, which instructs the Library to spend an initial $25 million to develop and execute a congressionally approved strategic plan • Eight initial projects, 2004-2007: • web pages, cultural heritage, numeric data, video, business records, mixed content, geospatial (2) • Developing partnerships and identifying issues • Extensive interaction among NDIIPP projects Note: Percentages based on the actual number of respondents to each question
Targeted Content • Resource Types • GIS “vector” (point/line/polygon) data • Digital orthophotography • Digital maps • Tabular data (e.g. assessment data) • Content Producers • Mostly state, local, regional agencies • Some university, not-for-profit, commercial • Selected local federal projects Note: Percentages based on the actual number of respondents to each question
Risks to Digital Geospatial Data .shp .mif .gml .e00 .dwg .dgn .bsb .bil .sid Note: Percentages based on the actual number of respondents to each question
Risks to Digital Geospatial Data • Focus on current data • Archiving data does not guarantee “permanent access” • Future support of data formats in question • Need to migrate formats or allow for emulation • Data failure • “Bit rot”, media failure • Preservation metadata requirements • Descriptive, administrative, technical, DRM • Shift to “streaming data” for access Note: Percentages based on the actual number of respondents to each question
Time series – vector data Parcel Boundary Changes 2001-2004, North Raleigh, NC Note: Percentages based on the actual number of respondents to each question
Time series – Ortho imagery Vicinity of Raleigh-Durham International Airport 1993-2002 Note: Percentages based on the actual number of respondents to each question
Today’s geospatial data as tomorrow’s cultural heritage Note: Percentages based on the actual number of respondents to each question
Earlier NCSU Acquisition Efforts • NCSU University Extension project 2000-2001 • Target: County/city data in eastern NC • “Digital rescue” not “digital preservation” • Project learning outcomes • Confirmed concerns about long term access • Need for efficient inventory/acquisition • Wide range in rights/licensing • Need to work within statewide infrastructure • Acquired experience; unanticipated collaboration Note: Percentages based on the actual number of respondents to each question
One Earlier Project Outcome: Directory of County and City Services Among top 15 most used resources on library web site 99.5% of directory users from outside ncsu.edu Note: Percentages based on the actual number of respondents to each question
NDIIPP Project Phases • Content Identification and Selection • Content Acquisition • Partnership Building • Content Retention and Transfer All 8 NDIIPP cooperative projects adhere to this structure Note: Percentages based on the actual number of respondents to each question
Content Identification and Selection • Work from NC OneMap Data Inventory • Combine with inventory information from various state agencies and from previous NCSU efforts • Develop methodology for selecting from among “early,” “middle,” and “late” stage products • Develop criteria for time series development • Investigate use of emerging Open Geospatial Consortium technologies in data identification Note: Percentages based on the actual number of respondents to each question
Content Acquisition • Work from NC OneMap Data Sharing Agreements as a starting point (the “blanket”) • Secure individual agreements (the “quilt”) • Investigate use of OGC technologies in capture • Use METS (Metadata Encoding and Transfer Standard) as a metadata wrapper • Bundle data files, metadata, ancillary documentation • Supplement FGDC metadata with additional administrative, technical, and descriptive metadata • Encode rights (Digital Rights Management – DRM) • Links to services Note: Percentages based on the actual number of respondents to each question
Partnership Building • Work within context of the NC OneMap initiative • Explore state, local, federal partnerships • Defined characteristic: “Historic and temporal data will be maintained and available” • Advisory Committee drawn from the NC Geographic Information Coordinating Council subcommittees • Seek external partners • National States Geographic Information Council • FGDC Historical Data Committee • … more Note: Percentages based on the actual number of respondents to each question
Content Retention and Transfer • Ingest into Dspace open source digital repository software • Look more generically at the issue of putting geospatial content into digital repositories • Investigate re-ingest into a second platform • Start to define format migration paths • Special problem: geodatabases • Purse long term solution • Roles of data producing agencies, state agencies; NC OneMap; NCSU Note: Percentages based on the actual number of respondents to each question
Big Geoarchiving Challenges • Format migration paths • Management of data versions over time • Preservation metadata • Preserving cartographic representation • Keeping content repository-agnostic • Preserving geodatabases • Harnessing geospatial web services • More … Note: Percentages based on the actual number of respondents to each question
Vector Data Format Issues • Vector data much more complicated than image data • ‘Preservation’ vs. ‘Permanent access’ • An ‘open’ pile of XML might make an archive, but if using it requires a team of programmers to do digital archaeology then it does not provide permanent access • Piles of XML need to be widely understood piles • GML: need widely accepted application schemas (like OSMM?) • The Geodatabase conundrum • Export feature classes, and lose topology, annotation, relationships, etc. • … or use the Geodatabase as the primary archival platform (some are now thinking this way) Note: Percentages based on the actual number of respondents to each question
Geography Markup Language Issues • GML still more useful as a transfer format than an archival format, support limited even for transfer • FGDC Historical Data Working Group investigations into GML for use in archiving • Plans for environmental scan of existing GML profiles and application schemas or profiles • schema name (e.g. OSMM, top10NL, ESRI GML, LandGML) • responsible agency; scheme has official government status? • GML version; known unsupported GML components • schema history; known interoperation with other schemas • vendor support; translator support Note: Percentages based on the actual number of respondents to each question
Managing Time-versioned Content • Many local agency data layers continuously updated • Older versions not generally available • Individual versioned datasets will wander off from the archive • How do users “get current metadata/DRM/object” from a versioned dataset found “in the wild”? • How do we certify concurrency and agreement between the metadata and the data? Note: Percentages based on the actual number of respondents to each question
Preservation Metadata Issues • FGDC Metadata • Many flavors, incoming metadata needs processing • Other standards: PREMIS, MODS • Metadata wrapper • METS (Metadata Encoding and Transmission Standard) vs. other industry solutions • Need a geospatial industry solution for the ‘METS-like problem’ • GeoDRM a likely trigger—wrapper to enforce licensing (MPEG 21 references in OGC Web Services 3) Note: Percentages based on the actual number of respondents to each question
Preserving Cartographic Representation • The true counterpart of the old map is not the GIS dataset, but rather the cartographic representation that builds on that data: • Intellectual choices about symbolization, layer combinations • Data models, analysis, annotations • Cartographic representation typically encoded in proprietary files (.avl, .lyr, .apr, .mxd) that do not lend themselves well to migration • Symbologies have meaning to particular communities at particular points in time, preserving information about symbol sets and their meaning is a different problem Note: Percentages based on the actual number of respondents to each question
Preserving Cartographic Representation Note: Percentages based on the actual number of respondents to each question
Preserving Cartographic Representation • Image-based approaches (“dessicated data”) • Generate images using Map Book or similar tools • Harvest existing atlas images • Capture atlases from WMS servers • Export ‘layouts’ or ‘maps’ to image • Vector-based approaches • Store explicitly in the data format (e.g. Feature Class Representation in ArcGIS 9.2) • Archive and upward-migrate existing files .avl, .apr, .lyr, .mxd, etc. • SVG, VML or other XML approaches • Other? Note: Percentages based on the actual number of respondents to each question
Preserving Cartographic Representation Note: Percentages based on the actual number of respondents to each question
Preserving Cartographic Representation Note: Percentages based on the actual number of respondents to each question
Preserving Geodatabases • Not just data layers and attributes—also topology, annotation, relationships, behaviors • ESRI Geodatabase archival issues • XML Export, Geodatabase History, File Geodatabase, Geodatabase Replication • Growing use of geodatabases by municipal, county agencies • Some looking to Geodatabase as archival platform (in addition to feature class export) Note: Percentages based on the actual number of respondents to each question
Geodatabase Availability • According to the 2003 Local Government GIS Data Inventory, 10.0% of all county framework data and 32.7% of all municipal framework data were managed in that format. Note: Percentages based on the actual number of respondents to each question
Evolving Geodatabase Handling Approaches Note: Percentages based on the actual number of respondents to each question
Harnessing Geospatial Web Services • Automated content identification • ‘capabilities files,’ registries, catalog services • WMS (Web Map Service) for batch extraction of image atlases • last ditch capture option • preserve cartographic representation • retain records of decision-making process • … feature services (WFS) later. • Rights issues in the web services space are ambiguous Note: Percentages based on the actual number of respondents to each question
Partnerships • ESRI • Discussing software requirements: meetings with development teams April 2005 • Open Geospatial Consortium (OGC) • Meet with Architecture Working Group Nov. 2005 • National Archives and Records Administration • Investigations into GML for archiving; planned presentation to NARA technology team • FGDC Historical Data Working Group • General geospatial data preservation issues Note: Percentages based on the actual number of respondents to each question
Partnerships • EDINA (University of Edinburgh, UK) • NCSU is Associate Partner on UK project for geospatial institutional repositories • UC Santa Barbara & Stanford University • Other NDIIPP geospatial project • EROS Data Center • Planned site visit • Project visits to regional GIS groups • Albemarle Regional GIS meeting Nov. 3 • More planned … Note: Percentages based on the actual number of respondents to each question
Progress to Date • Completion of project agreements • Hiring staff • Acquisition and deployment of storage system (12.4 TB capacity – two 16.8 TB systems) • Testing and deployment of repository software • Development of metadata workflow • Development of ingest workflow • Pilot project with NC Geologic Survey data … Initial focus on developing the “plumbing” Note: Percentages based on the actual number of respondents to each question
Questions for You? • What are your current practices for: • Archiving data and managing time versions • Managing geodatabase versions • Transfer mechanisms for data • to regional entities? • to off-site storage for disaster recovery? • Archiving project files and finished products • What rights issues exist with regard to putting county and city data into an archive? • What would you like this project to do? Note: Percentages based on the actual number of respondents to each question
Ways to Participate in NCGDAP • Identifying data for inclusion in the repository • Discussing data format strategies • Sharing ideas about archiving approaches and architectures • Sharing and identifying concerns about rights issues, liability, etc. • Host project visits to regional GIS groups • Use Local Government GIS listserv to discuss preservation issues? Note: Percentages based on the actual number of respondents to each question
Questions? Contact: Steve Morris Head, Digital Library Initiatives NCSU Libraries Steven_Morris@ncsu.edu http://www.lib.ncsu.edu/ncgdap Note: Percentages based on the actual number of respondents to each question