360 likes | 644 Views
Data Archiving & Preservation: Best Practices for GIS. Presentation to ARGIS - Atlanta Region GIS User Group October 30, 2013. Jennifer Doty | jennifer.doty@emory.edu Data Management Specialist Emory Center for Digital Scholarship. Overview. Best practices for managing geospatial data:
E N D
Data Archiving & Preservation: Best Practices for GIS Presentation to ARGIS - Atlanta Region GIS User Group October 30, 2013 Jennifer Doty| jennifer.doty@emory.edu Data Management Specialist Emory Center for Digital Scholarship
Overview Best practices for managing geospatial data: • File formats • Naming conventions • Folder structure • Storage and backup • Documentation Trends in geospatial data archiving: • Federal funding agencies’ requirements • State initiatives for preservation
Best Practices: File Formats UK Data Archive File Formats guide, http://www.data-archive.ac.uk/create-manage/format/formats-table
Best Practices: File Formats GeoMAPP Geospatial Data File Formats Reference Guide: • provides quick reference of common geospatial raster and vector dataset types • serves as tool to identify geospatial format types based on file extensions • also includes information on standards and specifications for documenting geospatial data http://www.geomapp.net/docs/GeoMAPP_Geospatial_data_file_formats_FINAL_20110701.xls
Best Practices: Naming Conventions • Create meaningful but brief naming conventions for your project • Use file names to classify broad types of files • Avoid using spaces and special characters • Begin names with letters, not numbers e.g. Census2010_blockgroups_GA, not 2010Census… • Avoid very long file names
Best Practices: Naming Conventions Example: keyword_steward_extent_date.ext • Keyword (essential)—be as descriptive of the contents of the data as possible by using a word or short phrase • Steward (essential)—either the creator of the dataset or the last one to make a significant modification to a dataset • Extent (optional)—may be included to indicate resolution of the data (e.g. county, state, or international) • Date (optional)—may be used to indicate the date of creation or the age range of the content. Recommended format is YYYYMMDD Indiana Geographic Information Council, http://www.igic.org/standards/namingstandard.pdf
Best Practices: Naming Conventions Versioning: • useful to indicate file revisions or edits, especially in collaborations • can be through discrete or continuous numbering, depending on minor or major revisions • think of software versioning—ArcGIS 10 was significant change from 9.x., but ArcGIS 10.1 was (relatively) minor change to 10
Best Practices: Folder Structure • Separate directories for scratch workspace and final data • Hierarchy—is deep or shallow best for your project?
Best Practices: Storage & Backup Storage Considerations: • Accessibility • Read/Write speed • Size limits—overall vs. file size Options: • Local—PC drive, flash drive, external hard drive • Server—department/organization server space • Cloud—Dropbox, Google Drive, etc.
Best Practices: Storage & Backup Backup Considerations: • Accessibility (local, server, cloud) • Redundancy (rule of thumb—here, near, far) Options: • Incremental/Snapshot • Automated
Best Practices: Documentation “When thoughtfully populated, geospatial metadata can be a critical resource for understanding and managing geospatial data for current and future GIS practitioners and those trying to preserve the data.” -Utilizing Geospatial Metadata to Support Data Preservation Practices, January 2011, GeoMAPP (http://www.geomapp.net/publications_categories.htm)
Best Practices: Documentation Metadata—represents the who, what, when, where, why and how Standards: • CSDGM (FGDC) • ISO 19115-2003 / 19139
FGDC’s Content Standard for Digital Geospatial Metadata (CSDGM) http://www.fgdc.gov/csdgmgraphical/index.html
Checklist: CSDGM Fields for Preservation Identification Information - basic info about data set, including: • party responsible—usually creator • publication date—date the data set is completed and ready for use • title—”where” “what” “when” • maintenance/update frequency—annually, as needed, based on census, etc. • bounding coordinates • keywords (theme and place) • access and use constraints—any restrictions, disclaimers, or guidance on data set attribution • contact details GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf
Checklist: CSDGM Fields for Preservation Data Quality Information – provides historical lineage and source descriptions for the data used in the creation of the data set, including: • originator • publisher, publication date & place • “currentness” of source data • process description GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf
Checklist: CSDGM Fields for Preservation Spatial Reference Information - description of the reference frame for, and the means to encode, coordinates in the data set, including: • map projection name • coordinate system name • unit of measure • geodetic model—datum, ellipsoid GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf
Checklist: CSDGM Fields for Preservation Entity and Attribute Information - details about content of the data set—the entities, their attributes, and domains from which attribute values may be assigned, including: • entity label • attribute label and description GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf
Checklist: CSDGM Fields for Preservation Metadata Reference Information - information on the party responsible for creating the metadata and the currentness of the metadata: • metadata standard name • metadata standard version GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf
Data ManagementInitiatives Federalagency mandates for sponsored research: • NSF & NIH requirements for DM plans • GIS Inventory (Ramona) & Federal Grants data sharing plans—gisinventory.net Other related initiatives: • USGS DM working group • DM training for early career researchers
FGDC Geospatial Data Lifecycle Model http://www.fgdc.gov/policyandplanning/a-16/stages-of-geospatial-data-lifecycle-a16.pdf
State & National Initiatives in Geospatial Data Archiving GeoMAPP - Geospatial Multistate Archive and Preservation Partnership (www.geomapp.net): • federally funded partnership between the Library of Congress and state geospatial and archives staff from North Carolina, Kentucky, Montana, and Utah National Digital Stewardship Alliance (NDSA), Geospatial Content Team (www.digitalpreservation.gov/ndsa): • report identifying appraisal and selection activities as they effect decisions defining geospatial content of enduring value for the nation
Open GeoPortal@ Emory NASA Goddard Photo and Video / CC BY
Contact Information: Jennifer Doty | jennifer.doty@emory.edu Data Management Specialist Michael Page | michael.page@emory.edu Geographer & Geospatial Data Librarian Emory Center for Digital Scholarship digitalscholarship.emory.edu