Benchmarking Your Search Function: A Metadata Maturity Model

Benchmarking Your Search Function: A Metadata Maturity Model Ron Daniel, Jr. Taxonomy Strategies LLC

Motivating Experiences • Different organizations have different levels of sophistication in their planning, execution, and follow-up for CMS, Search, Portal, Metadata, and Taxonomy projects. • Last year we had back-to-back engagements with clients who had very different levels of sophistication. • Tool Vendors continue to provide ever-more capable tools with ever-more sophisticated features. • We live in a world where a significant fraction of public, commercial, web pages don’t have a <title> tag. • Organizations that can’t manage <title> tags stand a very poor chance of putting an entity extractor to use, which requires some management of the lists of entities to be extracted. • Taxonomy governance processes must fit the organization • In terms of scale and complexity

Desiderata • Wanted a method to: • Predict likely sources of problems in engagements • Help clients identify the things they can do, and the things that stand an excellent chance of failing • Generally identify good and bad practices • These desiderata are not unique • Such methods have been defined for software development and other areas • They are known as Maturity Models

Goals for this Talk • Provide you with basic knowledge of maturity models • Give you the tools to do a simple self-assessment of your organization’s metadata maturity • Suggest practices that are, and are not, likely next steps in your organization’s development of: • Processes to manage search, metadata, and taxonomy deployments. • Overly-sophisticated processes will fail • Expertise around search, metadata, and taxonomies • Systems to create, manage, or use metadata and taxonomies • Tool selection • Overly-sophisticated tools will be very poor value-for-money • Have some fun

A Tale of Two Maturity Models • CMMI (Capability and Maturity Model – Integrated) • vs. • The Joel Test

CMMI’s Levels of Maturity, Translated • 1) Initial: You build software like you never have done it before and will never do it again. One hero spits out code and you don't worry about maintaining or documenting it. Whatever the programmer gives you is good enough for the end users. • 2) Repeatable: You actually have a project plan, and the plan might even include some quality assurance, documentation, and things like that. • 3) Defined: You follow the plan, which is at the organizational level rather than the project level. You expect to train people, have compatible software, and follow organizational standards. Think of skilled craftsmen following a blueprint and using the standards of their trade. • 4) Managed: The organization follows the plan and measures the progress as it goes, similar to an assembly line for software. Managers know what's happening as it happens and the software is also monitored. • 5) Optimizing: The final phase is when the factory becomes self-aware. The lessons learned on the project are used to prevent defects before they occur and manage technological changes. There's a constant organized feedback mechanism to improve the cycle time and product quality. “Modeling Data Management” – A report on discussions of Metadata Maturity at the 2002 DAMA Conference Joe Celko http://www.intelligententerprise.com/020726/512celko1_1.jhtml

22 Process Areas, Keyed to 5 Maturity Levels… • Process Areas contain Specific and Generic Practices, organized by Goals and Features • Maturity Model Axioms: • A Maturity Level is not achieved until ALL the Practices in that level are in operation. • Individual processes at higher levels are AT RISK from supporting processes at lower levels. These axioms are very questionable for the Metadata Maturity Model

CMMI Structure Previous Diagram only shows these two levels • Maturity Models are collections of Practices. • Main differences in Maturity Models concern: • Degree of Categorization of Practices • Descriptivist or Prescriptivist Purpose Source: http://chrguibert.free.fr/cmmi

CMMI Positives • Independent audits of an organization’s level of maturity are a common service • Level 3 certification frequently required in bids • “…compared with an average Level 2 program, Level 3 programs have 3.6 times fewer latent defects, Level 4 programs have 14.5 times fewer latent defects, and Level 5 programs have 16.8 times fewer latent defects”. Michael Diaz and Jeff King – “How CMM Impacts Quality, Productivity,Rework, and the Bottom Line” • ‘If you find yourself involved in product liability litigation you're going to hear terms like "prevailing standard of care" and "what a reasonable member of your profession would have done". Considering the fact that well over a thousand companies world-wide have achieved level 3 or above, and the body of knowledge about the CMM is readily available, you might have some explaining to do if you claim ignorance’. Linda Zarate in a review of A Guide to the Cmm: Understanding the Capability Maturity Model for Software by Kenneth M. Dymond

CMMI Negatives • Complexity and Expense • Reading and understanding the materials • Putting it into action – identifying processes, mapping processes to model, gathering required data, … • Audits are expensive • CMMI does not scale down well to small shops • Has been accused of restraint of trade

Developed by Joel Spolsky as reaction to CMMI complexity Positives - Quick, easy, and inexpensive to use. Negatives - Doesn’t scale up well: Not a good way to assure the quality of nuclear reactor software. Not suitable for scaring away liability lawyers. Not a longer-term improvement plan. The Joel Test Do you use source control? Can you make a build in one step? Do you make daily builds? Do you have a bug database? Do you fix bugs before writing new code? Do you have an up-to-date schedule? Do you have a spec? Do programmers have quiet working conditions? Do you use the best tools money can buy? Do you have testers? Do new candidates write code during their interview? Do you do hallway usability testing? Scoring: 1 point for each ‘yes’. Scores below 10 indicate serious trouble. At the Other Extreme, The Joel Test

A Maturity Rant, in Bullet Points • Metadata maturity may not be core to your business. • Maturity is not automatically a good thing. • Maturity is not a goal, it is a characterization of an organization’s methods for achieving its core goals. • Mature processes impose expenses which must be justified by consequent cost savings, revenue gains, or service improvements. • “Immature Processes” does not mean “can’t do good work”. It means “Good results depend on whether the company’s star performers are doing the job”. • Maturity predicts the worst that an organization might do on a job, not the best that it could do. • Nevertheless, Maturity Models are useful as collections of best practices and stages in which to try to adopt them.

Towards a Metadata Maturity Model

Caveats, Disclaimers, Provisos, Exclusions, Exemptions, and Limitations on Liability • Some maturity models are based on millions of dollars of research and decades of industry experience. • This isn’t one of them. • Adjust your expectations accordingly.

Basis for Following Materials • CEN study on commercial adoption of Dublin Core • Small-scale phone survey • Organizations which have world-class search and metadata externally • Not necessarily the most mature overall processes or the best internal search and metadata • Literature review • Client experiences

Search and Metadata Maturity Quick Quiz • Basic • Is there a process in place to examine query logs? • Is there a process for adding directories and content to the repository, or do people just do what they want? • Is there an organization-wide metadata standard, such as an extension of the Dublin Core, for use by search tools, multiple repositories, etc.? • Intermediate • Is there an ongoing data cleansing procedure to look for ROT (Redundant, Obsolete, Trivial content)? • Does the search engine index more than 4 repositories around the organization? • Are system features and metadata fields added based on cost/benefit analysis, rather than things that are easy to do with the current tools? • Are tools only acquired after requirements have been analyzed, or are major purchases sometimes made to use up year-end money? • Are there hiring and training practices especially for metadata and taxonomy positions? • Advanced • Are there established qualitative and quantitative measures of metadata quality? • Can the CEO explain the ROI for search and metadata?

Baseline for Comparison 14 Responses from 35 Attendees at a Taxonomy Workshop

Aspects of Search and Metadata Maturity “Limiting” Processes are harmful practices which interfere with maturity. We are collecting and categorizing Processes by Area and Level

Search Capabilities Processes, Categorized by Type and Level • Basic: • “Uniform Search Box” • “Query Log Examination” • Requires reporting functions and an identified staffer • Intermediate: • “Index Multiple Repositories” • Beyond simple web spidering • “Best Bets” • “Simple Results Grouping” • Advanced: • “Improved Ranking from Link and Popularity Analysis” • “Intranet Facet Navigation” • See Rosenfeld’s EIA Roadmap for more details on search capabilities staged over time. Highly Valuable Processes in Orange

Rosenfeld’s EIA Roadmap

Metadata and Taxonomy Standards • Basic: • “System Metadata Standards” • Intermediate: • “Defined Organizational Metadata Standard” • “Reuse of ERP Vocabularies” • Advanced: • “Multiple Repositories Comply with Metadata Standard” • “Taxonomy Roadmap” • A plan for adding facets over time, based on known upcoming projects which can use them. • Requires “Multi-Year Plan of Upcoming Projects” • Bleeding Edge: • “Highly Abstract Subject Taxonomies” • e.g. categorization by Mood & Emotion

“Organizational Metadata Standard” - How is Dublin Core extended? Base: 20 corporate information managers CEN/ISSS Workshop on Dublin Core – Guidance information for the deployment of Dublin Core metadata in Corporate Environments

Tools and Tool Selection • Limiting: • “Use of Unneeded Tool Capabilities” • e.g. autogenerated keywords • “Tools, then Requirements” • Related to “Use it or Lose it Budgeting” • Basic: • “Purpose, then Requirements, then Tools” • Intermediate: • “Datasets for Product Evaluations” • Advanced: • “Budgeted Evaluations”*

Staff Training and Hiring • Basic: • “Search Analyst Role” • Related to “Query Log Examination” • Intermediate: • “Adding and Appointing Library Expertise” • Advanced: • “Pre-Hire Testing” • Bleeding Edge • “Hiring Subject Matter Experts for Cataloging”

Data Creation and QA • Basic: • “Content Management Introduced” • Intermediate: • “ROT-Elimination” • Advanced: • “Hybrid Metadata Creation Models” • Bleeding Edge: • “Adaptive Qualification of End-User Feedback” • “Qualitative and Quantitative Measures of Metadata Quality”* * Hypothetical, not yet observed in survey participants

Methods used to create & maintain metadata:Note that Automation ≠ Maturity Base: 20 corporate information managers CEN/ISSS Workshop on Dublin Core – Guidance information for the deployment of Dublin Core metadata in Corporate Environments

Project Management • Basic: • “Project Plan” • Intermediate: • “Standard Project Methodology” • “Cross-functional Teams” • “Communication Plan” • “Multi-Year Plan of Upcoming Projects” • Advanced: • “Early Termination of Projects” • See Enterprise Search Report for much more on managing a search project.

Executive Support and ROI • Limiting: • “Use It or Lose It Budgeting” • Basic: • “External Search ROI” • Intermediate: • “Intranet ROI Model” • Advanced: • “CEO knows Search ROI” • See Enterprise Search Report for much more on search ROI.

Conclusions • Remember the rant – Maturity is a characterization of the way an organization achieves its goals, not a goal in and of itself. • Not all search needs are created equal. • Stock photo agencies are tops at search on external site. • Their intranets are no better than anyone else’s because the ROI is not clear. • Consulting agencies have better intranets and KM efforts because of the clearer ROI. • High Maturity really means a Metrics Emphasis • Some organizations believe that is inappropriate for them • Use this as a guide to decide where to improve, and to decide which processes may be more sophisticated than your organization can handle • Keep in mind the difference between organizational and team sophistication. A specific team may do some very advanced things, even if the organization around them is not “mature”.

Recommended Reading CMMI:http://chrguibert.free.fr/cmmi (Official site is http://www.sei.cmu.edu/cmmi/, but that is not the most comprehensible.) Joel Test http://www.joelonsoftware.com/articles/fog0000000043.html EIA Roadmap http://www.louisrosenfeld.com/presentations/031013-KMintranets.ppt Enterprise Search Report http://www.cmswatch.com/EntSearch/

Contact Info Ron Daniel 925-368-8371 rdaniel@taxonomystrategies.com Joseph Busch 415-377-7912 jbusch@taxonomystrategies.com

Benchmarking Your Search Function: A Metadata Maturity Model