170 likes | 294 Views
Long-term archiving of geospatial data: the NGDA project. Julie Sweetkind-Singer John Banning Stanford University. The Library of Congress and NDIIPP. $100 million from Congress, Dec. 2000. 1 st round of funding announced Sept. 30, 2004. 8 grants funded for nearly $14 million.
E N D
Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University
The Library of Congress and NDIIPP • $100 million from Congress, Dec. 2000. • 1st round of funding announced Sept. 30, 2004. • 8 grants funded for nearly $14 million. • 2nd round of funding announced May 6, 2005. • 10 awards totaling $3 million (in conjunction with NSF).
Funded Geospatial Projects Total of both awards: $3.1 million
What is meant by digital preservation? • “Reliable long-term access to managed digital resources to its designated communities, now and in the future.” (RLG/OCLC, 2002) • Trusted digital repository attributes
Key non-technical elements • Collection development • Assessing scope • Assessing risk • Contracts • Rights / use of materials • Cost of acquiring data. • Increasing the size of the collecting network.
Key technical elements • Large data sets • Versioning • Variety and complexity of formats • Proprietary file formats • Need for format infor-mation and specifications • Federation
External contacts • California Spatial Information Library (CASIL) • David Rumsey Collection • California Geological Survey • Katrina Image Warehouse • Digital Globe and GeoEye • ESRI
Technical Architecture-UCSB access ingest Web ADL OAI bulk loader archival system storage subsystem standard, public data model databases, caches, etc.
What is a format? • “A serialization of an abstract information model” • A set of syntactic and semantic rules for mapping from an information model to a byte stream (and, in most instances, for mapping back). • Without knowledge of its format, a digital object is merely a collection of undifferentiated bits.
What is a Format Registry? • Definition • The registry is a central location where information is stored and maintained in a controlled method. • This includes: Identifiers, Responsibility, Classification, Relationships, Specifications, Signatures, Grammar, Tools, and Assessment • Why do we need one? • Formats become obsolete over time • Need machine actionable validation of the format.
Goals of a Format Registry • Interpret the information content of that object properly. • Effective use, interchange, and preservation of all digitally-encoded content.
Current Efforts in Format Registries • Global Digital Format Registry (GDFR) • Digital Formats Web (Library of Congress) • PRONOM (UK) • NGDA (geospatial) • Long Now Foundation
Geospatial Example: ESRI Shapefile • ESRI Shapefile Technical Description white paper • dBase specification • Reference to different geospatial metadata standards • Additional documentation, specifications or statements on the various files that may be used as part of shapefiles (.sbn, .sbx, .prj, . xml, .fbn, .fbx)
Geospatial Example: ESRI Shapefile • Identifiers – “.shp” • Responsibility – ESRI 380 New York Street Redlands, CA 92373 • Tools – ArcGIS, ArcView 3.0, etc. Link to existing Format Registry: http://www.ngda.org/format/
Goals of the Project • Create robust preservation environments • Save at-risk data • Write collection development policy • Start a geospatial format registry • Develop guidelines for preservation of geospatial materials • Agree upon guidelines for participation in the NGDA
Relevant contact information • Julie Sweetkind-Singer • sweetkind@stanford.edu • John Banning • jbanning@stanford.edu • NGDA Web site • www.ngda.org • NDIIPP Web site • www.digitalpreservation.gov