Search, Metadata & Taxonomy Survey: Background and Results

Search, Metadata & Taxonomy Survey:Background and Results Ron Daniel, Jr. Taxonomy Strategies LLC Seth Earley Earley & Associates

Motivating Experiences • Different organizations have different levels of sophistication in their planning, execution, and follow-up for CMS, Search, Portal, Metadata, and Taxonomy projects. • Last year we had back-to-back engagements with clients who had very different levels of sophistication. • Tool Vendors continue to provide ever-more capable tools with ever-more sophisticated features. • But we live in a world where a significant fraction of public, commercial, web pages don’t have a <title> tag. • Organizations that can’t manage <title> tags stand a very poor chance of putting an entity extractor to use, which requires some management of the lists of entities to be extracted. • Taxonomy governance processes must fit the organization • In terms of scale and complexity

Desiderata • Wanted a method to: • Predict likely sources of problems in engagements • Help clients identify the things they can do, and the things that stand an excellent chance of failing • Generally identify good and bad practices • These desiderata are not unique • In the area of Software Development, considerable effort has gone into the development of “Maturity Models” • We have started to develop a Metadata Maturity Model • To keep the model tied to reality, we are conducting surveys to determine the actual state of practice around search, metadata, taxonomy, and supporting business functions such as staffing and project management.

Goals for this Talk • Provide you with background on maturity models • Provide the results of our first survey of Search, Metadata, & Taxonomy practices and discuss interesting findings • Give you the tools to do a simple self-assessment of your organization’s metadata maturity • Help you decide what practices are, and are not, likely next steps in your organization’s development of: • Governance Processes to manage search, metadata, and taxonomy deployments • Expertise around search, metadata, and taxonomies • Systems to create, manage, or use metadata and taxonomies • Tool selection

A Tale of Two Software Maturity Models • CMMI (Capability Maturity Model Integration) • vs. • The Joel Test

CMMI Structure • Maturity Models are collections of Practices. • Main differences in Maturity Models concern: • Descriptivist or Prescriptivist Purpose • Degree of Categorization of Practices • Number of Practices (~400 in CMMI) Source: http://chrguibert.free.fr/cmmi

22 Process Areas, Keyed to 5 Maturity Levels… • Process Areas contain Specific and Generic Practices, organized by Goals and Features • CMMI Axioms: • A Maturity Level is not achieved until ALL the Practices in that level are in operation. • Individual processes at higher levels are AT RISK from supporting processes at lower levels.

CMMI’s Levels of Maturity, Translated • 1) Initial: You build software like you never have done it before and will never do it again. One hero spits out code and you don't worry about maintaining or documenting it. Whatever the programmer gives you is good enough for the end users. • 2) Repeatable: You actually have a project plan, and the plan might even include some quality assurance, documentation, and things like that. • 3) Defined: You follow the plan, which is at the organizational level rather than the project level. You expect to train people, have compatible software, and follow organizational standards. Think of skilled craftsmen following a blueprint and using the standards of their trade. • 4) Managed: The organization follows the plan and measures the progress as it goes, similar to an assembly line for software. Managers know what's happening as it happens and the software is also monitored. • 5) Optimizing: The final phase is when the factory becomes self-aware. The lessons learned on the project are used to prevent defects before they occur and manage technological changes. There's a constant organized feedback mechanism to improve the cycle time and product quality. “Modeling Data Management” – A report on discussions of Metadata Maturity at the 2002 DAMA Conference Joe Celko http://www.intelligententerprise.com/020726/512celko1_1.jhtml

CMMI Positives • Independent audits of an organization’s level of maturity are a common service • Level 3 certification frequently required in bids • “…compared with an average Level 2 program, Level 3 programs have 3.6 times fewer latent defects, Level 4 programs have 14.5 times fewer latent defects, and Level 5 programs have 16.8 times fewer latent defects”. Michael Diaz and Jeff King – “How CMM Impacts Quality, Productivity,Rework, and the Bottom Line” • ‘If you find yourself involved in product liability litigation you're going to hear terms like "prevailing standard of care" and "what a reasonable member of your profession would have done". Considering the fact that well over a thousand companies world-wide have achieved level 3 or above, and the body of knowledge about the CMM is readily available, you might have some explaining to do if you claim ignorance’. Linda Zarate in a review of A Guide to the Cmm: Understanding the Capability Maturity Model for Software by Kenneth M. Dymond

CMMI Negatives • Complexity and Expense • Reading and understanding the materials • Putting it into action – identifying processes, mapping processes to model, gathering required data, … • Audits are expensive • CMMI does not scale down well to small shops • Has been accused of restraint of trade

Developed by Joel Spolsky as reaction to CMMI complexity Positives - Quick, easy, and inexpensive to use. Negatives - Doesn’t scale up well: Not a good way to assure the quality of nuclear reactor software. Not suitable for scaring away liability lawyers. Not a longer-term improvement plan. The Joel Test Do you use source control? Can you make a build in one step? Do you make daily builds? Do you have a bug database? Do you fix bugs before writing new code? Do you have an up-to-date schedule? Do you have a spec? Do programmers have quiet working conditions? Do you use the best tools money can buy? Do you have testers? Do new candidates write code during their interview? Do you do hallway usability testing? Scoring: 1 point for each ‘yes’. Scores below 10 indicate serious trouble. At the Other Extreme, The Joel Test

What does Software Development “Maturity” Really Mean? • A low score on a maturity audit DOES NOT mean that an organization can’t develop good software • It DOES mean that whether the organization will do a good job depends on the specific mix of people assigned to the project • In other words, it sets a floor for how bad an organization is likely to do, not a ceiling on how good they can do • Probability of failure is a good thing to know before spending a lot of time and money

Towards a Metadata Maturity Model • Caveats: • Maturity is not a goal, it is a characterization of an organization’s methods for achieving its core goals. • Mature processes impose expenses which must be justified by consequent cost savings, revenue gains, or service improvements. • Nevertheless, Maturity Models are useful as collections of best practices and stages in which to try to adopt them.

Initial Metadata Maturity Model (ca. May, 2005) 37 Practices, Categorized by Area and Level

Basis for Initial Maturity Model • CEN study on commercial adoption of Dublin Core • Small-scale phone survey • Organizations which have world-class search and metadata externally • Not necessarily the most mature overall processes or the best internal search and metadata • Literature review • Client experiences

Shortcomings of the Initial Model • No idea of how it corresponds to actual practice across multiple organizations • The initial metadata maturity model can be regarded as a hypothesis about how an organization progresses through various practices as it matures • How to test it? Let’s ask! • Will ask about future, current, and former practices to gather information on progression • Will ask in stages because of large number of practices

Search, Metadata, & Taxonomy Survey Results

Participants by Organization Size Results presented later come from merging surveys for two groups. This chart is for the larger group (67 participants). The other survey had 20 participants.

Participants by Job Role

Participants by Industry

Search Practices

Metadata Practices These two questions were the only ones with much correlation to organization size

Taxonomy Practices

Notes from Participants • Validating metadata schema and vocabularies against metadata use scenarios Defining change management and governance policies for both schemas and controlled vocabularies. [Would be helpful if you clarified the definition of taxonomy you are using in the last question, not clear if it's intended to be navigational/browse structures, or controlled vocabularies/thesauri.] • We use a Wiki with categories and page specific tags to locate interesting stuff in our corporate memory • The e-commerce division has a taxonomist on staff who analyzes the configuration of the Verity search on the e-commerce web site. She maintains a taxonomy and works with programmers to change search parameters to optimize the user search. • We have GIS (geographic Information systems) metadata editor and the data is stored in Oracle DB (SDE) together with each stored layer • Approaches to metadata QA an enormous, currently unadressed problem. Gov of [X] has minimum level of compliance required to common metadata scheme (dublin core sub set). Convincing individual depts of value to move beyond minimum, and to evalute quality of what has been done and exploit it and move beyond is sucessfull only in isolated pockets. Central Agencies [X] try but have no stick to use to enforce. Mostly a bottom up effort from an increasingly weary group of sloggers. • Change Control Board Site Registration System being developed • 1) Standard corp taxo (products, capabilities, industries, doc type) drives tagging (from doc repository) to serve up docs/collateral to appropriate pages on internal and external web presences, and to CRM so sales reps can retrieve them through that interface. 2) Standard Corp taxo is used by CMS, doc repository and CRM.

Interim Conclusions

Observations (1) • Practices which a single person or a small group can carry out are more commonly used • Not surprising • Very different than ERP/BPR, indicates that information management is not being sold to the “C-level” staff. • People need to question how inclusive their “Organizational Metadata Standards” and “Taxonomy Roadmaps” actually are. • We have found Taxonomy Roadmaps to be an advanced practice, due to a dependence on knowing upcoming IT development schedule • Many of the basics are being skipped • More organizations doing “Spell Checking” than “Query Log Analysis”

Observations (2) • Practices from original MMM are more sophisticated than most participating organizations are ready for • Makes me question whether field really knows what the “Best Practices” are • Will merge “Advanced” and “Bleeding Edge” practices in the model • Will try to find very basic practices to better flesh out the Basic vs. Intermediate level • Many participants from very small organizations • Will change the size scale to include < 10, not just < 100 • Will try to have consultants use a special questionnaire

Next Survey: Job Roles, Governance Structures, and Hiring/Training Practices • What do you want to know about what other organizations are doing in these areas? • Job Titles and Responsibilities • Training and Experience Criteria • Organizational Structure • etc. • Send suggestions to rdaniel@taxonomystrategies.com and/or seth@earley.com

Search and Metadata Maturity Quick Quiz • Basic • Is there a process in place to examine query logs? • Is there a process for adding directories and content to the repository, or do people just do what they want? • Is there an organization-wide metadata standard, such as an extension of the Dublin Core, for use by search tools, multiple repositories, etc.? • Intermediate • Is there an ongoing data cleansing procedure to look for ROT (Redundant, Obsolete, Trivial content)? • Does the search engine index more than 4 repositories around the organization? • Does the search engine integrate with the taxonomy to improves searches and organize results? • Are tools only acquired after requirements have been analyzed, or are major purchases sometimes made to use up year-end money? • Are there hiring and training practices especially for metadata and taxonomy positions? • Advanced • Are there established qualitative and quantitative measures of metadata quality? • Can the CEO explain the ROI for search and metadata?

Recommended Reading CMMI:http://chrguibert.free.fr/cmmi (Official site is http://www.sei.cmu.edu/cmmi/, but that is not the most comprehensible.) Joel Test http://www.joelonsoftware.com/articles/fog0000000043.html EIA Roadmap http://www.louisrosenfeld.com/presentations/031013-KMintranets.ppt Enterprise Search Report http://www.cmswatch.com/EntSearch/

Contact Info Ron Daniel, Jr. 925-368-8371 rdaniel@taxonomystrategies.com

Search, Metadata &amp; Taxonomy Survey: Background and Results

Search, Metadata &amp; Taxonomy Survey: Background and Results

Presentation Transcript

Search, Metadata & Taxonomy Survey: Background and Results

Search, Metadata & Taxonomy Survey: Background and Results