1 / 55

Taxonomies and Metadata for Content Management

Taxonomies and Metadata for Content Management. Michael Huff Information Resource Officer U.S. Department of State. E-Government Act of 2002.

Download Presentation

Taxonomies and Metadata for Content Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Taxonomies and Metadatafor Content Management Michael HuffInformation Resource OfficerU.S. Department of State

  2. E-Government Act of 2002 • The use of computers and the Internet is rapidly transforming societal interactions and the relationships among citizens, private businesses, and the Government. • The Federal Government has had uneven success in applying advances in information technology to enhance governmental functions and services, achieve more efficient performance, increase access to Government information, and increase citizen participation in Government. • Most Internet-based services of the Federal Government are developed and presented separately, according to the jurisdictional boundaries of an individual department or agency, rather than being integrated cooperatively according to function or topic.

  3. Which U.S. Government organizations are experienced in using metadata & taxonomy tools? • Defense Intelligence Agency • USDA Economic Research Service (ERS) • Federal Aviation Administration • FirstGov • NASA • Small Business Administration • Social Security Administration • Department of State

  4. Taxonomy Metadata

  5. Why use metadata? • Adding metadata to unstructured content allows it to be managed like structured content. • Enriching content with structured metadata is critical for supporting search and personalized content delivery. • Content that has been adequately tagged with metadata can be leveraged in usage tracking, personalization and improved searching.

  6. Where does metadata fit in the information system architecture? User experience. How content is presented and how users experience and interact with it dictates its perceived and actual value. Content architecture: Scalable metadata framework to enable content reuse, and handle changes in organization goals, user needs, and retrieval concerns. Tools and technology. The information supply-chain platform that enables workflows, and supports organizational and operational concerns.

  7. What is Dublin Core? • Dublin Core is the metadata standard for describing Internet resources so they are easy to find. Original workshop held in Dublin, Ohio. Dublin Core approved as ISO 15836. Shanghai meeting. 95 04 03 For more information: http://www.dublincore.org

  8. Subject metadata – What & Why: Subject, Description, Coverage Use metadata – How can it be used: Rights & Permissions Complexity Asset metadata – Who, Where & When: Title, Creator, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language Relational metadata – Links between and to: Relation Enabled Functionality Why is metadata important? Better navigation & discovery More efficient editorial process http://dublincore.org/documents/dcmi-terms/

  9. What is a taxonomy? The specification of the names of people, places, things The specification of the names of people, places, things … and everything else that is needed to allow search engines and other content applications to work better. Animalia Chordata Mammalia Carnivora Canidae Canis C. familiari Kingdom Phylum Class Order Family Genus Species Linnaeus … 44-Office Equipment and Accessories and Supplies .12-Office Supplies .17-Writing Instruments .05-Mechanical pencils .06-Wooden pencils .07-Colored pencils Segment Family Class Commodity UNSPSC …

  10. Sample Recipe Taxonomy Facet Categories Main Ingredients Meal Type Cuisines Courses Cooking Methods Chocolate Dairy Fruits Grains Meat & Seafood Nuts Olives Pasta Spices & Seasonings Vegetables Breakfast Brunch Lunch Supper Dinner Snack Advanced Bake Broil Fry Grill Marinade Microwave No Cooking Poach Quick Roast Sauté Slow Cooking Steam Stir-fry Appetizers Beverages Breads Cheese Cocktails Desserts Fish & Shellfish Fruit Hors d'Oeuvres Meat Pasta Salad Sandwiches Soup Vegetables African American Asian Caribbean Continental Eclectic/ Fusion/ International Jewish Latin American Mediterranean Middle Eastern Vegetarian Controlled Vocabularies

  11. Main Ingredients Meal Type Cuisines Cooking Methods Chocolate Dairy Fruits Grains Meat & Seafood Nuts Olives Pasta Spices & Seasonings Vegetables Breakfast Brunch Lunch Supper Dinner Snack African American Asian Caribbean Continental Eclectic/ Fusion/ International Jewish Latin American Mediterranean Middle Eastern Vegetarian Advanced Bake Broil Fry Grill Marinade Microwave No Cooking Poach Quick Roast Sauté Slow Cooking Steam Stir-fry The power of taxonomy facets • 4 independent categories of 10 nodes each have the same discriminatory power as one hierarchy of 10,000 nodes (104) • Easier to maintain • Can be easier to navigate

  12. 7 Common taxonomy facets Personalized content delivery requires defining taxonomy facets … and re-use of existing vocabulary sources

  13. Applying the facets to the Dublin Core metadata elements Applied taxonomy metadata facilitates a multi-faceted view of content

  14. Frequency Organization Audience Content Type Facets at work on FirstGov site http://www.firstgov.gov

  15. Powered by Guided Navigation 2-3 clicks to product No dead ends http://www.tesco.com/winestore

  16. http://www.towerrecords.com

  17. Powered by http://www.fortunoff.com

  18. Seven practical rules for taxonomies • Incremental, extensible process that identifies and enables owners, and engages stakeholders. • Quick implementation that provides measurable results as quickly as possible. • Not monolithic—has separately maintainable facets. • Re-uses existing IP as much as possible. • A means to an end, and not the end in itself. • Not perfect, but it does the job it is supposed to do—such as improving search and navigation. • Improved over time, and maintained.

  19. Browse by Topic Link to Bios from Personal Names Link to company data (quotes, news, ...) from Company names Link to info on Countries Alerts on People, Companies, and Topics • Creating a taxonomy is only part of the job • How will it be put to use? • In a new application, or by modifying an existing application? • What’s the effort around that? • Additional Issues • Tagging – Who will add the metadata and how?

  20. 1 Identify Objectives Conduct interviews 2 Inventory Content ID sources, spider assets & extract metadata 3 Specify Metadata Define fields & purpose 4 Model Content Define content chunks & XML DTDs 5 Specify Vocabularies Compile controlled vocabularies 6 Specify Procedures Develop workflow, rules & procedures 7 Train Staff Develop materials & train staff

  21. Task 1 – Identify objectives What do you do? What kinds of digital assets are being produced? For what audiences? What is the business process for submitting, selecting, editing, maintaining digital assets? How many digital assets are there? How fast is this growing? Are there particular industry or other standards that are important? What types of assets are hard to search for (that should be easier to find)? What tools would be helpful in locating assets? Acronyms? Abbreviations? Nick names? Glossary? Thesaurus? Taxonomy? Who else should we be talking to?

  22. 1. Identify target asset file path/URL. 2. Automatically generate inventory metadata by crawling file stores. 3. Audit assets using inventory. 4. Enhance metadata with new facets. Spider-generated New facets Path/URL Audit process Task 2 – Inventory content

  23. Task 3 – Specify metadata Legend: ? – 1 or more * - 0 or more

  24. Task 4 – Model content Header area Factor asset types from inventory into canonical types. Select examples from inventory (possibly with spider). Identify useful chunks for each asset type. Factor chunks into element superset. Identify relationships between chunks. Iterate until agree on asset types, elements, and relationships. Main content area Footer area Left navigation area

  25. Task 5 – Specify vocabularies Develop broad taxonomy outline (1-3 levels deep) Review, revise, and approve taxonomy outline with stakeholders and subject matter experts. Fill in taxonomy outline Tag random samples from content inventory Review, revise, and approve draft taxonomy with stakeholders and subject matter experts.

  26. Task 6 – Specify procedures Develop taxonomy style rules, ensure that the taxonomy follows them. Develop tagging rules and procedures, along with software to assist in the task. Specify taxonomy maintenance process and the update procedures to follow.

  27. Firewall Application UI Tagging UI Application Logic Tagging Logic Tagging Staff Taxonomy Editor Task 6 – Governance & Maintenance The taxonomy must be changed over time. Suggestions for changes can come from users, through query log analysis, and staff, from feedback form. Governance structure needed to make sure changes are justified. Content Taxonomy Staff notes ‘missing’ concepts Query log analysis End User Recommendations by Editor 1 Small taxonomy changes (labels, synonyms) 2 Large taxonomy changes (retagging, application changes) 3 New ‘best bets’ content Committee considerations 1 Business Goals 2 Change in user experience 3 Retagging cost Steering Committee

  28. Task 6 – Steering Committee Roles Business Lead Keeps committee on track with larger business objectives Balances cost/benefit issues to decide appropriate levels of effort Specialists help in estimating costs Obtains needed resources if those in committee can’t accomplish a particular task Technical Specialist Estimates costs of proposed changes in terms of amount of data to be retagged, additional storage and processing burden, software changes, etc. Helps obtain data from various systems Content Specialist Committee’s liaison to content creators Estimates costs of proposed changes in terms of editorial process changes, additional or reduced workload, etc. Taxonomy Specialist Suggests potential taxonomy changes based on analysis of query logs, indexer feedback Makes edits to taxonomy, installs into system with aid of IT specialist Content Owner Reality check on process change suggestions

  29. Task 7 – Train staff Staff will require training on The UI they use to tag the content The rules to follow when deciding what codes to apply The end-effect of the codes they apply The structure of the taxonomy Tagging examples come from the content inventory Hardcopies of the taxonomy, and yellow highlighters, are helpful during training Indexing UI

  30. What about Automatic Categorization? • Automatic vs. Manual Categorization is a cost/benefit tradeoff • Semi-automated recommended over pure manual in production situations. • Automatic performance not bad, but not equal to trained manual tagging. • Software is not sane, so errors look crazy. • Large backlogs of content can’t justify investment of high-quality manual tagging • Old articles rarely accessed. • Recommend automated bulk tagging with error reporting and correction process.

  31. What about automatically-created taxonomies? Typically a single hierarchy with no overall plan Results hard for people to navigate What about automatic categorization? Accuracy close to human levels, but errors are very different Cost/benefit tradeoff Semi-automation is best practice

  32. Enterprise taxonomy maintenance workflow Problem? Yes No Add to enterprise Taxonomy Suggest new name/category Review new name Copy edit new name Problem? Taxon-omy No Yes Analyst Taxonomy Tool Editor Copywriter Sys Admin

  33. Categorize with a purpose What is the problem you are trying to solve? Improve search Browse for content on an enterprise-wide portal Enable users to syndicate content Otherwise provide the basis for content re-use How will you control the cost of creating and maintaining the metadata) needed to solve these problems? CMS with a metadata tagging products Semi-automated classification Taxonomy editing tools Guided navigation tools

  34. How do you sell it? Don’t sell the taxonomy, sell the vision of what you want to be able to do Clearly understanding what the problem is and what the opportunities are Costs and benefits Design the taxonomy in relation to the value at hand

  35. Internet Resources

  36. U.S. Government Resources

  37. http://www.nasa.gov/home/index.html

  38. http://pub-lib.jpl.nasa.gov/pub-lib/dscgi/ds.py/View/Collection-10http://pub-lib.jpl.nasa.gov/pub-lib/dscgi/ds.py/View/Collection-10

  39. http://www.loc.gov/flicc/wg/taxonomy.html

  40. http://www.loc.gov/lexico/servlet/lexico/

  41. http://www.archives.gov/federal_register/code_of_federal_regulations/thesaurus.htmlhttp://www.archives.gov/federal_register/code_of_federal_regulations/thesaurus.html

  42. http://feapmo.gov/

  43. http://www.km.gov/

  44. Other Resources

  45. http://www.educause.edu/asp/taxonomy/show_taxonomy_links.asp?TREE=1&EXPAND=1http://www.educause.edu/asp/taxonomy/show_taxonomy_links.asp?TREE=1&EXPAND=1

  46. http://databases.unesco.org/thesaurus/

  47. http://www.naa.gov.au/recordkeeping/control/functions_thesaur/contents.htmlhttp://www.naa.gov.au/recordkeeping/control/functions_thesaur/contents.html

More Related