350 likes | 558 Views
Developing Enterprise Business Vocabularies (EBV). IMF Enterprise Information Architecture Team December 15, 2009. Agenda. Introduction Overview of EBV in production: Master Countries & Entities Vocabulary Master Regions & Groups Vocabulary Historical Country Names vocabulary Governance
E N D
Developing Enterprise Business Vocabularies (EBV) IMF Enterprise Information Architecture Team December 15, 2009
Agenda • Introduction • Overview of EBV in production: • Master Countries & Entities Vocabulary • Master Regions & Groups Vocabulary • Historical Country Names vocabulary • Governance • Developing an enterprise Topics Vocabulary
Introduction • Project: create and manage an enterprise metadata repository. • Goals: • improve the consistency of information description and dissemination • promote reuse of content • improve findability • Team:Shewan Workneh, Sharon Schmitt, Xiaoli Huang, Julie Contreras
Enterprise Business Vocabularies (EBV) • Represent the system-of-record for the Fund • Provide consistent values to core metadata elements like Countries, Economic Concepts, Regional Groups • Maintain a central point of control over the business semantics • Disseminate core elements and common values to applications via web services
EBV at a glance for users: Workshop Web for taxonomists & vocabulary owners: Workshop
Purpose of Topics Vocabulary(Why?) • Bring topics used in various venues to a central point • Create topics once, reuse and share topics Fund-wide • Connect topics and contents in structured (data) and un-structuredcontent (documents) • Broaden and complement each stakeholder’s perspective on a given topic • E.g., Commodity Prices, RES and AFR
Purpose of Topics Vocabulary(Why?) • Browsing, indexing, and tagging can rely on the same set of topic terms Assumption: Metadata compliance from various systems • Suggest authoritative topic terms at the time of authoring and allow new inputs from authors to the vocabulary • Implement a feed from TagXchange and offer the official repository for hosting user tags (new topics), to keep the topics vocabulary dynamic and growing
Framework for Structuring Topics • Initial investigation reveals that the Sectors as defined in DSBB (GDDS, SDDS) provides a Fund-wide acceptable framework: • KE aligns with the Sector Framework • CTS sectors align with the Sector Framework at the 2nd level • Advantage: Aggregating the structured and un-structured content • Decision point: Broad vs. Specific(see details next page) • Broad: Based on the broad sector groupings in DSBB • Specific: Based on the specific sectors in CTS • Open to other options with stakeholders’ inputs
Steps to Develop Topics Vocabulary(How?) • Step 1: Scoping the Sources • Step 2: Merging and Clustering • Step 3: Structuring • The most controversial and critical step • Manager’s advice is greatly needed • Step 4: Presenting • Step 5: Integrating with Fund Applications
Step 1: Scoping the Sources • Purpose: To pin down a comprehensive inventory of taxonomies, thesauri, departmental topic lists, and data dissemination topic metadata (e.g., SDDS) • Method: Research and discussion with stakeholders • Initial output: An integrated term file from 10 identified key sources: KE, FAD, ePublishing, DSBB, CTS, Thesaurus, FIN, MCM, SPR, and Legal, over 3000 terms in total
Step 2: Merging and Clustering • Purpose: To analyze and identify overlaps among topic terms from various sources in the inventory • Method: Merging and clustering terms based on their conceptual similarities: • Exact match (including Single / Plural, Spelling variant, Acronym) • Synonym • Broader / Narrower • Related • Output: A list of term clusters • E.g., We start with 2000 terms, after merging and grouping we end up with only 1000 entries in the master topics vocabulary, with each entry now linked to a set of synonyms and spelling variants referring to the same concept
Step 2: Merging and Clustering Examples:
Step 3: Structuring • Purpose: To organize the term clusters into a topic hierarchy, which can be used • to arrange and aggregate site content by topic, and • to facilitate browsing and searching on topics Fund wide • Method: • First decide on the top-level categories: the Sector Framework as suggested and discussed earlier • Arrange term clusters into appropriate top-level categories • Work out sub-levels and detailed structure within each category • Output: A topics hierarchy that is useful and acceptable to all the stakeholders
Step 3: Structuring Concern:
Step 4: Presenting • Purpose: to present and import the terms and relationships (synonyms/variants, broader/narrower, related terms) as identified earlier from Steps 2 and 3 into SchemaLogic • Method: to apply ISO 2788 (Guidelines for the establishment and development of monolingual thesauri) e.g. use single instead of plural form for labeling a concept and Simple Knowledge Organization System (SKOS) which provides a model for expressing the basic structure and content of concept schemes • Output: A master topics vocabulary that complies with commonly accepted metadata standards
Step 5: Integrating with Fund Applications • Purpose: • To enable consistent topic browsing, indexing, and tagging across the Fund • To support topical content aggregation by bringing together both documents and data • To obtain feeds from applications such as TagXchange • Method: Web Services • Output: Everybody is happy and efficient!! • World Bank Topics • OECD Topics
Thank you!Questions? Shewan Workneh: sworkneh@imf.org Xiaoli Huang: xhuang3@imf.org