160 likes | 285 Views
Enhanced Data Description for End Users ScribeKey, LLC. Brian Hebert, Solutions Architect www.scribekey.com. ScribeKey Project Experience. Global FGDC Metadata production for large commercial data provider(s)
E N D
Enhanced Data Description for End Users ScribeKey, LLC Brian Hebert, Solutions Architect www.scribekey.com
ScribeKey Project Experience • Global FGDC Metadata production for large commercial data provider(s) • Federal Agency Assistance: Assess, describe, and standardize large collection of geospatial datasets • Experience with data cleansing, metadata, integration, presentation, application development. 200+ Countries 72 Layers 100s of Attributes 100s of Domains Quarterly Updates 50+ States 400 Layers 1000s of Attributes 100s of Domains Annual Updates www.scribekey.com 2
Goal: Make Data Easy to Understand and Use • Data users today have more information than ever to keep track of. • Individual provider data may be just part of larger data use and mission. • Learning about data can take considerable time and effort. • How to best help data customer understand and use data the most effectively? • Reduce the learning curve. www.scribekey.com
Multiple Data Description Sources Website Metadata Documentation Email User Tech Support Data Itself Users learn how to use data through a variety of sources www.scribekey.com 4
Data Description Checklist Meaning Structure • Is there a Data User Guide? A glossary and index? • Are primary data categories and entities fully described? • Are all acronyms, abbreviations, provider vocabulary terms explained? • Are short, cryptic database field names and values explained? • Are data types, lengths, keys, nulls allowed, formats, lists clear to help user form SQL queries? • Is FGDC/ISO Metadata available? • Are sample values and data profiles available? • Are data presentations, maps, symbols, reports prepared for quick start? • All this info in one place? Contents Complete metadata describes Meaning, Structure, and Contents. Maximize understanding by end user to help write queries/reports. www.scribekey.com 5
Solution: Lightweight HTML Data Dictionary Full descriptions of data categories, entities, attributes, domain values. Information integrated from documentation, data profiles, metadata, and data provider website. Available as stand alone HTML or on web site. www.scribekey.com 6
Dataset Overview A Library Science Indexing/Abstracting approach is taken to ensure the most important and useful information is seen first. Focus here is on clearly describing top level data categories, layers and tables. Key data provider terminology and concepts are explained. www.scribekey.com 7
Layer and Table Details Includes Name, Geometry Type, Definition, Attribute List, Keywords, and link to standard FGDC/ISO Metadata Drill down to review Attributes and Domains FGDC metadata is typically organized and accessed as set of separate XML documents. ScribeKey’s approach integrates these separate documents, making all information available at a single access point. Search/Highlight/Filter/Sort www.scribekey.com 8
Attributes and Domain Values Core Data Info: All dataset metadata including Data Type, Length, Format, Nulls Allowed, Primary and Foreign Keys, Join Information, Sample Values, Percent Complete. This data profiling information is essential for end user wanting to generate information products as reports, maps, charts, and graphs from SQL queries. www.scribekey.com 9
Helping with the Data Provider/End User Communication Gap “Layer Table Attribute Map Symbol Centroid Join Report” “Impute FROMHN EDGES ADDRFN Internal Point MTFCC S1100” Provider Language User Language Data providers and users have different languages and understandings of data. Use of keywords, aliases, and definitions in data dictionary helps bridge this gap; provides a translation www.scribekey.com 10
How Does Data Profiling Help? An essential tool for enhanced metadata: shows end user actual sample values, data types, lengths, formats, percent complete, etc. This valuable contents information is typically not found in metadata. www.scribekey.com
ScribeKey Metadata Generation • Sample data is reviewed and profiled. Any metadata is imported into repository. • From profile, existing user documentation, technical support staff, and website, a metadata repository is populated and metadata document templates are developed. • FGDC/ISO Metadata generated, as XML/HTML reports, from metadata repository. Metadata Repository Metadata Templates Metadata Templates Metadata Export App PDF DOC FGDC XML HTML www.scribekey.com 12
Map, Query, Report Preparation .MXD Preparation Metadata Layers Use metadata to create GIS layers to allow variety of map presentations, reports, etc. to summarize and highlight datasets by metadata values. Prepared for end user quick start: can include symbol set up, joins/relates, maps, queries, reports, www.scribekey.com 13
The Geospatial Metadata Repository Data Dictionary METADATA REPOSITORY Data Layers Enhanced User Views Metadata Pivot Tables Areas Entities Derivative Datasets Documents Meta-Maps Assessments Attributes Domains Schemas The Metadata Repository, implemented as an RDMBS, is populated with automated tools then used to generate metadata outputs, data dictionary content, schemas, maps, etc. www.scribekey.com 14
Recap: ScribeKey Data Description Support • Generate or Upgrade FGDC/ISO Metadata • Profile Data to provide user with actual contents information • Help develop Data User Guides (PDF) and Website Copy • Help author Indexes, Abstracts, and Glossaries • Integrate multiple and separate data description materials in a single lightweight HTML front end. • Help prepare ArcMap, .mxd, symbols, joins, reports, and maps • Result: Data is as easy to understand and use as possible www.scribekey.com 15
About www.scribekey.com • ScribeKey, LLC: Massachusetts Corporation • Brian Hebert, PMP, 30+ years designing and building desktop and web DB/GIS solutions • Extensive experience producing metadata and data dictionaries for data providers and end users • Extensive experience with data integration, data quality assessments, data cleansing, ETL, and application development with ESRI/ArcObjects, .NET, SQL, XML, HTML • Small focused teams, template approach, quick turnarounds, practical approach www.scribekey.com