580 likes | 719 Views
Enterprise Taxonomies - Context, Structures & Integration. Presentation to American Society of Indexers Annual Conference – Arlington Virginia – May 15, 2004 Denise A. D. Bedford. Background. Systems analyst & information architect Cataloger/classifier
E N D
Enterprise Taxonomies - Context, Structures & Integration Presentation to American Society of Indexers Annual Conference – Arlington Virginia – May 15, 2004 Denise A. D. Bedford
Background • Systems analyst & information architect • Cataloger/classifier • Collection development – Russian East European Collections • Acquisitions Librarian/Bibliographic Searcher • Reference librarian • Childrens Librarian • Usability engineer • Worked for publishers & bookstores • Professor -- Information/Library/Computer Science education • I’ve seen it from all angles…
Presentation Overview • Enterprise Content Architecture Basics • Taxonomy Basics • Strategy for creating your enterprise content architecture
Voices of Experience • Recently we looked back at what we had learned in implementing content management systems, intranets, external web sites • As we embark upon an Enterprise Content Architecture we found we had learned 17 lessons • The top lesson that we agreed we had learned was to begin any of these projects with a high level reference model – essentially a blueprint • >5% of my time is devoted to all I will show you today – possible because of reference model base
Enterprise Architecture Basics • Design your Enterprise Architecture to support your goals • Enterprise implies integration and context • High level reference model must take into account the following • Functional Architecture • Technical Architecture • Content Architecture • Presentation Architecture
What are the Goals of the World Bank Enterprise Architecture? Increase the value and quality of content - Build intelligent relationships among disparate content sources using concepts and metadata - Define, enforce, monitor processes/procedures on content collections to ensure quality Facilitate integration and repurposing of content - Provide broad search and retrieval capabilities - Increase reuse and decrease redundancy across content providers • Consistent information security and disclosure enforcement • - Bank records must be consistent in order to facilitate disclosure policy compliance and information sharing for partners Simplify and complete the content life-cycle - Reduce the number of user-facing content entry points by using already existent business processes - Manage content end-to-end from initial inception to final disposition
Content Integration • Content integration in the World Bank Catalog Search & Browse • Content Integration on the External Web Site • Content Integration in Project Portal • Content Integration in Donors Portal • For example…
Data Charts Content Documents & Records Content People & Communities Content People & Communities Content Publications Content Knowledge Content Project Portal – Project Context 10
Data Charts Data Reports Content Services Content Documents & Records Content Donor Portal – Donor Context 11
Communications Content Knowledge Content People & Communities Content External Web Site – Public Info Context Documents & Records Content Services Content Communications Content Publications Content 09 October, 2001 Expanding Access to Content 12
Audience Focused Context Voting & Elections Retirement Benefits Energy Legal & Judicial Resources Tax Resources Law Enforcement Passport & Visa Consumer Protection Government Locator Health & Medical Agriculture
Individual Focused Context My Voting Information Today My Retirement Benefits Today My Legal Rights Today In Regards to a Specific Incident My Heating Bills My Tax Returns Who are My Law Enforcement Contacts Consumer Protection Pertaining to What I Purchase My Passport & Visa My Local Government Offices My Medical Benefits
Where do you start? Reference Models
Blueprint Your Enterprise Content Architecture • Blueprint your ECA just as you would a home - by thinking about what it will contain, how it will be used and who will use it, • Would you simply chat with an architect, with a carpenter, a plumber and electrician and trust that they’ll build the home you need? • End game of blueprinting you ECA is a high level reference model • Taxonomies live in every component of your ECA – they become ECA when you integrate them
Benefits of Reference Model • High level reference model enables: • Open architectures – swapping in and swapping out components over time without loss of investment • Appropriate functional growth at the component level • Extensibility of content coverage • Scalability of the architecture in terms of volume of content and level of use • Emergence of an enterprise level thinking about how to manage content • Enterprise level thinking about stewardship and governance of information
Blueprinting Example – World Bank • Let’s walk through a blueprinting exercise to see how we came to discover our functional. technical, content and presentation architectures
Content Scatter & Integration • Content Integration problem -- • Documents in IRIS, ImageBank, IRAMS… • Data in BW, DEC SIMA queries in central, regional & agency databases, CDF indicators, GDF data reports, . • Publications in JOLIS, Office of Publisher, Thematic Group databases… • Communications in External Affairs, Office of President, DEC, IRIS… • People & Communities in YourNet, PeopleSoft, WBDirectory,… • Knowledge in Notes databases, Oral History program,… • Services in WB Yellow Pages, Service Portal,… • Collections in EIU database, Oxford Analytica
Kind of Content to Support • Content type is different than format type – content is defined as the kind of information that is contained in an information object • Began with a comprehensive survey of all kinds of content in our information systems including SAP, Lotus Notes Databases and Email, Document Management, Archives, Intranet, External Web, unit-specific repositories, EnCorr correspondence system • Grouped content we found into eight top level classes – retained the second level classes as system specific – we are harmonizing at second level over time • Top level classes were defined by the purpose of the content as well as content architecture/structure
Enterprise Level Content Type Classification Scheme • Begin to use the architecture of content to manage from the point of creation through full life-cycle • Top Tier (Institutional) Content Types • Comprised of broad ‘buckets’ or content types • Comparable metadata & meta-information • Accessed, used & presented in similar ways • Content lives in different source systems • Virtual attribute for metadata at institutional level • Facilitates searching for a type of content across sources • Second Tier (Business System) Content Types • Source system resource types mapped to top tier groups • Specific administrative value in source system • Access controlled at this level • Content typically lives in one source system 6
Enterprise Content Architecture • Each organization has to make their own decisions here • We have to respect the business system ownership of the content • We leave business system information in tact, map to enterprise content architecture • ECM then means managing functionality using a high level set of metadata across the organization • Means harmonizing attributes and in some cases managing the values for those attributes
Big Picture Enterprise Content Architecture Site Specific Searching Publications Catalog World Bank Catalog/ Enterprise Search Recommender Engines Personal Profiles Portal Content Syndication Browse & Navigation Structures Metadata Repository Of Bank Standard Metadata Reference Tables Topics, Countries Document Types Transformation Rules Data Governance Bodies Metadata Extract Metadata Extract Metadata Extract Metadata Extract Metadata Extract Metadata Extract IRIS Doc Mgmt System Web Content Mgmt. Metadata Board Documents Metadata IRAMS Metadata JOLIS Metadata InfoShop Metadata Concept Extraction, Categorization & Summarization Technologies
World Bank ECA Content Contributor End User Content Systems DELIVERY Metadata Management and Security Services ePublish PDS …. access rules Content Access Services Content Management Services view multilingual srch workflow create/del. check in/out retention schedule search syndication versioning declare classification browsing notification Business Activity Topic Class Scheme Content Integration and Archives Services relate Connector Concept extraction rules evaluator harmonize Adapter thesaurus Series Names monitors SAP (R/3, BW) Notes / Domino Archives Store Over Time Documents, Images, Audio, Data records Metadata warehouse logs People Soft iLAP Repositories Services Business Systems
Basic Functional Components for Goals • Content Integration Services • Metadata harvest, rationalization and harmonization • Access to metadata entries, content maps and content • Repository Services • Defined storage strategy for content over time • High performance, accessible and scalable metadata and content stores • Content Access Services • Bank-wide search and retrieval • Access control for all bank records • Syndication of content to partners institutions – e.g. GDG
Basic Functional Components for Goals • Content Management Services • Content management function oriented services – versioning, check-in/check-out, collaboration, work flow • Metadata Management and Security services • Services managing reference data, data dictionaries, taxonomies, thesaurus, business rules (access, security, disposition) which cut across all services
Enterprise Thinking • In the future, we hope to achieve enterprise wide use of full range of reference tables • Some will be ‘closed loop’ stewardship models • Some will be ‘bi-directional’ stewardship models • Idea is that different groups thoughout the enterprise will become stewards of different reference sources • Governance models and taxonomy structures need to be suited to their purpose – not just one kind of taxonomy or one way to govern
Content Architectures • Content types can evolve into content architecture specifications • Content architecture specifications can evolve into input templates – in future building from content element level • You cannot repurpose and decompose working from BLOBs • To manage content type creep, define libraries of content elements within the Top Level types • Grow content templates at the element level but within content type element libraries • Example of doing top down and bottom up development work
Designing for Use • Metadata provides the lowest level of the blueprint for how our content will be used • In an ECA, assumption is that use is enabled across systems • Need to have a core set of metadata that are available across systems to support the ECA • If you have enterprise content types then you are in a better position to see what that core set is • Traditionally, metadata focuses heavily on content features and pays less attention to how it will be used
World Bank Metadata Requirements • Standard metadata schemes are primarily encoding schemes – don’t just accept someone else’s encoding scheme • You should begin by understanding purpose of metadata attributes in a schema • We have used Use Case modeling as a technique to: • help us understand how content will be used • kinds of access points we need • how each access point will behave • what kind of an underlying taxonomy supports it • Knowledge & Learning Environment
Metadata Basics • Assume you will not change the current business systems • Challenge here is to manage complexity, maintain source systems, respect content security & still meet users expectations • Support integrated use by creating a warehouse of metadata pertinent to access, search, syndication, use management, records compliance and learning • Define metadata attribute super classes to which existing business system metadata are mapped • Attributes may be rationalized, harmonized or value-controlled within super classes
Bank Metadata– Purpose & Taxonomies Identification/ Distinction Search & Browse Compliant Document Management Use Management Hierarchical Taxonony Network Taxonomy Faceted Taxonomy Flat Taxonony
Taxonomy Examples • Enterprise Topic Classification Scheme – hierarchical taxonomy • World Bank Thesaurus – English, French, Spanish – network taxonomy • Metadata Attribute Detailed Specifications – faceted taxonomy • Content Type Classification Scheme – hierarchical taxonomy • Transformation Rules – faceted taxonomy
The ECA Taxonomy View Thesaurus Language Topics
Taxonomy Basics • Given this blueprint, let’s step back and examine: • Where we find taxonomies • What kind of taxonomies we need • Where we have what we need already • Where we should integrate what exists • Where we need to start from scratch • When we do start from scratch, how do we begin
Definition of a taxonomy • “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications
Taxonomy Architectures • Taxonomy architectures are important to designing taxonomies which: • are suited to their purpose • sustainable over time • provide strong application support to information applications in the new challenging web environment • Taxonomy = architecture + application + usability • Time is too short today to go into the usability issues deeply, but be aware that they are design & implementation issues
Taxonomy Applications • Taxonomies are structures which can be explicitly presented - they can be distinct data structures or interface features • Taxonomies are structures which can be implicitly designed into an application - structures which are embedded or designed into the content or transaction that is being managed
Taxonomy Architectures • There are four types of taxonomy architectures: • Flat • Hierarchical • Network • Faceted • In my experience, most of the problems we encounter working with ‘taxonomies’ derive from to the fact that we don’t establish the type of taxonomy architecture we need before we begin creating them!
Flat Taxonomy Architecture Energy Environment Education Economics Transport Trade Labor Agriculture
Flat Taxonomies • Group content into a controlled set of categories • There is no inherent relationship among the categories - they are co-equal groups with labels • The structure is one of ‘membership’ in the taxonomy • Alphabetical listing of people is a flat taxonomy • Lists of countries or states • Lists of currencies • Controlled vocabularies • List of security classification values
Facet Taxonomy Architecture Faceted taxonomy architecture looks like a star. Each node in the star structure is associated with the object in the center.
Facet Taxonomies • Facets can describe a property or value • Facets can represent different views or aspects of a single topic • The contents of each attribute may have other kinds of taxonomies associated with them • Facets are attributes - their values are called facet values • Meaning in the structure derives from the association of the categories to the object or primary topic • Put a person in the center of a facet taxonomy for e-gov, for KLE initiatives
Metadata as Facet Taxonomy • Metadata is one type of faceted taxonomy • Each attribute is a facet of a content object • Creator/Author • Title • Language • Publication Date • Access Rights • Format • Edition • Keywords • Topics
Hierarchical Taxonomy Architecture A hierarchical taxonomy is represented as a tree architecture. The tree consists of nodes and links. The relationships become ‘associations’ with meaning. Meanings in a hierarchy are fairly limited in scope – group membership, Type, instance. In a hierarchical taxonomy, a node can have only one parent.
Hierarchical Taxonomies • Hierarchical taxonomies structure content into at least two levels • Hierarchies are bi-directional • Each direction has meaning • Moving up the hierarchy means expanding the category or concept • Moving down the hierarchy means refining the category or the concept
Network Taxonomy Architecture A network taxonomy is a plex architecture. Each node can have more than one parent. Any item in a plex structure can be linked to any other item. In plex structures, links can be meaningful & different.
Network taxonomies • Taxonomy which organizes content into both hierarchical & associative categories • Combination of a hierarchy & star architectures • Any two nodes in a network taxonomy may be linked • Categories or concepts are linked to one another based on the nature of their associations • Links may have more complex meaningful than we find in hierarchical taxonomies
Network taxonomies • Network taxonomies allow us to design complex thesauri, ontologies, concept maps, topic maps, knowledge maps, knowledge representations • The future semantic web will have a network architecture where the associations among the concepts not only have distinct meanings but also have contextualized rules to link them • Often meaningful links take form of a ‘prolog-like’ grammar • has_color • is_a_cause_of • is_a_process_of • Caution – don’t let someone build a hierarchy for you when you need a network structure