130 likes | 235 Views
Overview of Metadata Strategy. Kevin J. Kirby Data Architect, US EPA March 2008. Summary of Issues for Data Advisory Council. Direction from July 2007 Meeting. Warehousing is only a means to an end
E N D
Overview of Metadata Strategy Kevin J. KirbyData Architect, US EPAMarch 2008 Summary of Issues for Data Advisory Council
Direction from July 2007 Meeting • Warehousing is only a means to an end • Warehouses, ETL tools, data marts, and all the rest are only of interest to the extent that they promote sharing data—making data available for users inside and outside the program. • “Enable to share” means enabling EPA to share data within programs, across programs, with partners, and with the public. • Interoperability is expensive and long term, and not all data needs to be shared • Most EPA data is of interest in pretty narrow program context. • We need to develop a warehousing approach based on a determination of: • Data we always need to share (facility, geospatial, substance) • Data we occasionally need to share, and • Data we may never need to share March 2008
Purpose and General Approach: Phase 1(through April 14, 2008) • Premise: “Enable to Share” • Internal to EPA • Between agencies (Environmental Line of Business) • With the public • First priority: Data Object discovery and evaluation • What data is available? • How do I know if it is adequate to my purpose? • How do I get it? • Future priorities: Understanding the details • Data elements, data models, transfer schema, etc. Data object registries are the first access point for discovery March 2008
Proposed Metadata Framework for Data “Objects” Objects include: DBMS Data Sets Unstructured Data (e-mail, docs)Multimedia etc.
Metadata Framework for Discovery & Evaluation Categories of metadata help the user assess the value of the data set. Levels of metadata exist within an RDBMS set, especially for evaluating quality and security issues. Standard taxonomies aid discovery. These might be specific to broad categories like “Admin./Financial”. EPA Data Classification is a start. March 2008
Applying the Framework to the EDA Seldom shared: least rigorous • Frequently shared vs. seldom shared • A-Level: Frequently shared, complete application of framework required • B-Level: Less frequently shared, subset of framework required • C-Level: Rarely or never shared, no requirements • A + B Levels must be represented in at least one data object registry C B A Frequently shared: most rigorous March 2008
A-Level Entire Framework required B-Level Business Security/Sensitivity Location & Access ETL Admin/Transaction data only if available Data Set and Data Profile Data Object Level Metadata for Sharing March 2008
Data Object Registry Candidates Coverage is Incomplete
Candidate Data Object Registries Only Informatica appears to manage all framework metadata categories, but it applies only to data objects that it manages Proposed metadata framework categories March 2008
Federated Registries with a Common Front End Search Tool Conceptual Architecture Using Faceted Search
Conceptual Federated Search Architecture Major gap is for RDBMS Data Sets not managed by Informatica March 2008
Governance Artifacts to Implement this Framework A National Data Policy Modeled after NGDP
Governance High-level Artifacts March 2008