270 likes | 528 Views
Baseline Findings EPA Enterprise Data Architecture / Data Management Metadata. Kevin Kirby, Enterprise Data Architect / EPA Enterprise Architecture Team Kirby.Kevin@epamail.epa.gov (202) 566-1656. Agenda. Present cross-federal context for metadata study at EPA
E N D
Baseline FindingsEPA Enterprise Data Architecture / Data Management Metadata Kevin Kirby, Enterprise Data Architect / EPA Enterprise Architecture Team Kirby.Kevin@epamail.epa.gov(202) 566-1656
Agenda Present cross-federal context for metadata study at EPA Discuss related terms/concepts/taxonomies Present and vet approach for proposed Metadata Maturity Model Present Metadata Mapping study Next steps
Types of Data Transactional data • Dollars earned or units sold Reference data • Entity by which transactions measured • ‘Country’, ‘Prefix’ and ‘Industry Master data • Single version of the truth • Key corporate reference entities like ‘Customer’, ‘Location’ and ‘Product’ Metadata • Describes objects by connecting objects to the subjects they are about
Types of Metadata Technical- data sources, access protocol (ODBC, JDBC, SQL*NET, etc.), physical schema (database definition, table definition, column definition, etc), logical data source (ER models, object models, etc.) Business- contextual data about the information retrieved; taxonomies that define business organizations and product hierarchies; controlled vocabulary or reference data that are used to define business terms such as a medical dictionary and financial terminology.
Metadata & Related Terms • Metadata describes objects by connecting objects to the subjects they are about • Controlled vocabularyis a closed list of subjects that can be used for classification • Taxonomy is a subject-based classification that arranges the terms in the controlled vocabulary into a hierarchy • Thesauri take taxonomies and extend them to make them better able to describe the world
Taxonomy • Metadata can be organized using a taxonomy • Helps an audience find information more easily • Blue lines reflect metadata; black lines reflect taxonomy • Blue lines – metadata about the paper • Black lines – subject-based taxonomy
Taxonomy Categorization Schemes Hardest Easiest
Thesauri (e.g. ISO2788) • BT ( Broader Term) - refers to the term above this one in the hierarchy • SN (Scope Node) - a string attached to the term explaining its meaning • USE - refers to another term that is preferred to this term • TT (Top Term) - refers to the topmost ancestor • RT (Related Term)- refers to a term, related to this term, without being a synonym
Metadata Maturity Model WITH NO METADATA MGMT • Information is lost or hidden • Data integration is costly • Cannot support everyday business • Information is difficult to find • Partial & dated information • Loss of trust in data METADATA MANAGEMENT The organization of technical and business metadata with the goal to advance the sharing, retrieving and understanding of enterprise information assets.
Metadata Maturity ModelPhase I: Ad Hoc PROCESS • Changes are locally acquired, made and consumed • Sharing through conversations with ‘incumbents’ • Infrequent changes TECHNOLOGY • Spreadsheets and unstructured tools • Application specific metadata components PEOPLE • Small group of rouge metadata warriors • Knowledge is in people’s heads • Sharing of metadata is ad-hoc
Metadata Maturity ModelPhase II:Discovered PROCESS • Limited sharing of metadata • Local or semi-local repositories • Local attempts at managing metadata • Exploration of core metadata and metadata tools TECHNOLOGY • Modeling tools • Application specific metadata components • Some metadata management tools • Mix PEOPLE • Management awareness • Sporadic adding to various repositories • ‘Talk’ about importance of sharing metadata
Metadata Maturity ModelPhase III:Managed PROCESS • Governance process is created and enforced • Workflows • Communication with ‘outside’ departments • Beginnings of real-time integration TECHNOLOGY • Metadata management tools with governance process • Workflow engine • Business rule engine • Data integration tools PEOPLE • Data stewards • Data governance body • Management understands importance of administering metadata
Metadata Maturity ModelPhase IV:Integrated PEOPLE • Constantly seeking optimization • Metadata administrators – centralized validation PROCESS • Enterprise-level standards • Taxonomy, Ontologies, etc. • Authoritative data sources for entities TECHNOLOGY • Collaboration tools • Enterprise data modeling tool • Vocabulary and taxonomy management tool
Metadata Maturity ModelPhase V:Optimized PEOPLE • Start managing metadata as part of business • Critical, ubiquitous, invisible part of the organization TECHNOLOGY • Ontology management • Reasoning technology • Data mediation PROCESS • Automated real-time integration • Domain ontologies & topic maps • Seamless integration at low cost
Metadata Management Framework v.2 • Metadata Category • Extendable baseline • Aids classification • Metadata Order • Intuitive faceted framework • Helps users find data • Metadata Taxonomy • Prioritizes metadata domains • Allows for domain extentions
Metadata Cross-Reference Expandable Set of Core Systems Flexible & Expandable Categories Domain-Specific Extensions • Small, core set, based on Dublin Core • Owned and maintained by Enterprise Architecture • Extensible by domain and core systems • In sync with data standards • Main purpose is to aid ‘findability’
The Dublin Core Standard • Created in 1995 to aid internet searches • Most common metadata standard • Every metadata description should describe just one information resource • 15 core data elements
Many Standards bodies exist Content is modifiable Extensions can be used & registered
Dublin Core Framework & Extensions Domain specific metadata extensions (e.g. geospatial) Dublin Core adopted as standard Metadata extensions for managing information through its lifecycle Mandatory set of Common Look and Feel elements Extensions for clusters and gateways
Data Governance Components • Data Stewards • Principle – ‘Guardians’ of Data • Business – Help define data and stewardship standards • Data Architects • Part of EA; Understand EA • Broker requests for new data and data changes • Responsible for enterprise-wide taxonomy • Data Advisory Committee (DAC) • Strategic • Managers & Execs • Broad representation • Infrastructure Team • Responsible for physical architecture and data provision • DBA’s & Developers • Systems & Network Administrators
Value vs. Cost of Metadata High awareness but no governance • ROI point • Start of governance • Right of Phase III Sharp rise in cost for unmanaged metadata
Next Steps Continue to expand the Baseline findings to include Core Mission Area Segments’ Data Stores Vet findings and proposed metadata framework with System Owners and COI Begin gathering requirements / considerations for metadata tools