270 likes | 452 Views
Frequently Asked Questions about Taxonomies and Metadata. Ron Daniel Taxonomy Strategies LLC rdaniel@taxonomystrategies.com. Agenda. FAQs – Frequently Asked Questions SAQs – Seldom Asked Questions Fun Questions. Pop Quiz. On a blank piece of paper:
E N D
Frequently Asked Questions about Taxonomies and Metadata Ron Daniel Taxonomy Strategies LLC rdaniel@taxonomystrategies.com
Agenda • FAQs – Frequently Asked Questions • SAQs – Seldom Asked Questions • Fun Questions
Pop Quiz On a blank piece of paper: • What question(s) did you want to have answered by coming to today’s talks? Please provide your job title, division, and either company or company type. You do NOT have to provide your name.
What do other people ask about? • How to build a taxonomy? • Definitions of terms. • How to govern its use and maintenance? • What’s the ROI? • What are they for? • How do we put them to use? • How do we link them to content? • How do they help search? • How do I sell management on a taxonomy project? • How do we maintain them? and many more…
What is a taxonomy – just a folder structure or something else? • There is no agreed definition of what a “taxonomy” is. • When talking with someone about taxonomy, make sure you are talking about the same things. • When we talk about a taxonomy, we are NOT only talking about a website navigation scheme. • Websites change frequently, we are looking at a more durable way to deal with content so that different navigation schemes can be used over time. • We look at taxonomies and metadata together. • We typically create a metadata specification that defines fields like Title, Description, Date, Type, Subject, etc. • Several fields (e.g. Type and Subject) have pre-defined lists of allowed values. • Those lists of values, flat or hierarchical, are “facets” within the overall taxonomy.
How do taxonomies actually improve search? Input (Query) Side • “Search” using a small set of pre-defined values instead of trying to guess what word or words might have been used in the content. • Providing dropdowns instead of search improves results, but is limiting. • Have synonyms mapped together so searches for “car” and “automobile” return the same things. Output (Results) Side • Organize search results into groups of related items. • Sorting and filtering • Refinement
Taxonomy in action on the results side • Position Category • Company • City • State • Salary
Where do the benefits come from?Common taxonomy ROI scenarios • Catalog site - ROI based on increased sales through improved: • Product findability • Product cross-sells and up-sells • Customer loyalty • Call center - ROI based on cutting costs through: • Fewer customer calls due to improved website self-service • Faster, more accurate CSR responses through better information access • Compliance – ROI based on: • Avoiding penalties for breaching regulations • Following required procedures (e.g. Medical claims) • Knowledge worker productivity - ROI based on cutting costs through: • Less time searching for things • Less time recreating existing materials, with knock-on benefits of less confusion and reduced storage and backup costs • Executive mandate • No ROI at the start, just someone with a vision and the budget to make it happen For more details on taxonomy ROI, and other topics, see http://www.taxonomystrategies.com/presentations/Taxonomy_1-2-3a.ppt
How do I sell Management on a Taxonomy Project? • Don’t sell “metadata” or “taxonomy”, sell the vision of what you want to be able to do. • Clearly understand what the problem is and what the opportunities are. • Calculate costs and benefits so you can explain the ROI in a believable manner. • Design the taxonomy (in terms of level of effort) in relation to the value at hand.
Who should build the taxonomy? • The taxonomy (and metadata specification) should be produced by a cross-functional team which includes business, technical, information management, and content creation stakeholders. • The team should plan on maintaining the taxonomy as well as building it. • Maintenance will not (usually) be anyone’s full-time job. • Exact mix of people on team will change. • It should be built in an iterative fashion, with more content and broader review for each iteration.
How Do We Build a Taxonomy? • Know the ROI case – what is the benefit you want and what can you afford in the way of tagging, software, and other expenses. • Know the content to be categorized and the people who will use it. Have an idea of the UI they will use to access the content. • Get the team together. • Go through the process, in an iterative manner.
How do we build a Taxonomy: Process Overview 1 Identify Objectives Conduct interviews 2 Inventory Resources Identify, gather & review resources 3 Specify Metadata Define fields & purpose 4 Model Content Define content chunks & XML DTDs 5 Specify Vocabularies Compile controlled vocabularies 6 Specify Procedures Develop workflow, rules & procedures 7 Test & Train Manually tag sample Week: 1 2 3 4 5 6 7 8 9 10 11 12
Building a Taxonomy: Which fields need controlled values? These five elements are the ones that take the most thought when defining a metadata spec. These 15 fields are the Dublin Core – the starting point for most modern metadata specs.
How big should the taxonomy be? • Consultant’s answer – “It depends” • How much content do you need to organize? • How fine-grained does the categorization need to be? • Overly-simplistic method: • Nterms = # items / desired bucket size • (1 M documents, 100 documents / bucket = > 10k buckets) • Bad method – documents don’t distribute evenly • Second method: • # facets ≈ Log(# items) ± 2 • (1 M items => 5..7 facets) • Sum of terms across all facets < 1200 in most cases
How do we know we have a good taxonomy? For much more on “Testing Your Taxonomy”, see http://www.taxonomystrategies.com/presentations/Taxonomy_Testing-2006-11-03.ppt
What if I have to do it solo? Realize: Its not totally solo – IT help, Graphics & UI help, Business Goals help, Funding help, Review & QA help… You are the general contractor It needs to be part of your objectives Limit the objectives to what can be achieved by you, and by your organization Concentrate: Resource allocation (i.e. Manage your time) Fundamental processes Query log examination Error correction procedure Communications!!! Cherry-pick from Roles on a larger team: Business Lead – align with organization goals, get needed resources, make cost/benefit decisions, report upstairs IT Liaison – Work with IT specialists to get software installed, logs gathered, content harvested, etc. Consider impact of changes on tools and data Taxonomy / Search Specialist – analyze behavior and suggest changes. Implement changes which pass cost/benefit muster Website/User Representative – consider impact of changes on users and job performance
Agenda • FAQs – Frequently Asked Questions • SAQs – Seldom Asked Questions • Your Questions
What should I be thinking about at the start of a taxonomy project? Taxonomy development is not the most important problem: • The Taxonomy Problem: How are we going to maintain the lists of pre-defined values that can go into some of the metadata elements? • The Tagging Problem: How are we going to populate metadata elements with complete and consistent values? • What can we expect to get from automatic classifiers? What kind of error detection and error correction procedures do we need? What fields do we need? • The ROI (Return On Investment) Problem: How are we going to use content, metadata, and vocabularies in applications to obtain business benefits? • More sales? Lower support costs? Greater productivity? Risk avoidance? • How much content? How big an operating budget? How to expose to users? Business Goals and Cultural Factors are major influences on tagging and taxonomy. These must be acknowledged at the start to avoid rework.
What must change when the Taxonomy changes? There’s more to maintaining the Taxonomy than maintaining just the taxonomy. • The master copy of the taxonomy. • Announcements for stakeholders! • The information sent to downstream users of the taxonomy. • The versions and formats of the taxonomy distributed to others. • The list of changes. • The data tagged with the taxonomy? • The user interface which uses the taxonomy? • Backend system software which uses the taxonomy? • The training set for automatic classifiers? • The educational material for users, catalogers, programmers, etc.?
Agenda • FAQs – Frequently Asked Questions • SAQs – Seldom Asked Questions • Your Questions
Fun Questions • Examples of good and bad taxonomies This was created to be as bad a classification as possible. What makes it so bad? The animals are divided into:(a) belonging to the emperor,(b) embalmed, (c) tame, (d) sucking pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the present classification,(i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from along way off look like flies. Jorge Luis Borges, " THE ANALYTICAL LANGUAGE OF JOHN WILKINS"Works in 3 volumes (in Russian). St. Petersburg, "Polaris", 1994. V. 2: 87.
Why do we usually recommend faceted taxonomies? Main Ingredients Meal Type Cuisines Cooking Methods • Chocolate • Dairy • Fruits • Grains • Meat & Seafood • Nuts • Olives • Pasta • Spices & Seasonings • Vegetables • Breakfast • Brunch • Lunch • Supper • Dinner • Snack • African • American • Asian • Caribbean • Continental • Eclectic/ Fusion/ International • Jewish • Latin American • Mediterranean • Middle Eastern • Vegetarian • Advanced • Bake • Broil • Fry • Grill • Marinade • Microwave • No Cooking • Poach • Quick • Roast • Sauté • Slow Cooking • Steam • Stir-fry • Categorize in multiple, independent, categories. • Allow combinations of categories to narrow the choice of items. • 4 independent categories of 10 nodes each have the same discriminatory power as one hierarchy of 10,000 nodes (104) • Easier to maintain • Easier to reusue existing material • Can be easier to navigate, if software supports it 42 values to maintain (10+6+11+15) 9900 combinations (10x6x11x15)
What could possibly go wrong with a little edit? • ERP (Enterprise Resource Planning) team made a change to the product line data element in the product hierarchy. • They did not know this data was used by downstream applications outside of ERP. • An item data standards council discovered the error. • If the error had not been identified and fixed, the company’s sales force would not be correctly compensated. “Lack of the enterprise data standards process in the item subject area has cost us at least 30 person days of just ‘category’ rework.” Source: Danette McGilvray, Granite Falls Consulting, Inc. 25
When should we NOT use facets? When you have to work with software that can’t handle them. Remember, software is replaced but data is migrated. When you need to use an existing standard taxonomy. … By Content Type Calendars & Events Top Links… Holidays Upcoming Events Federal Reserve System… Beige Book Board of Governors FOMC More Calendars & Events… ERAC Officer Availability Staff Conference Toastmasters Tours Directories Documentation Forms News Policies & Procedures By Organization Federal Reserve System FRB Atlanta Board of Directors Executive Office Management Committee Research Division S&R Division Facets can help you build a useful hierarchy. This one is a mix of content type and organization.
Questions? Ron Daniel 925-368-8371 rdaniel@taxonomystrategies.com