Improving Metadata Quality: An Iterative Approach Using Scrum

Clifford B. AndersonCurator of Special CollectionsPrinceton Theological Seminary Library Princeton, NJ (USA) Improving Metadata Quality: An Iterative Approach Using Scrum 4 Conclusion • The digital team approaches theproblem of quality control in fourstages: • Identifying metadataproblems and lacunae • Prioritizing the most importantstories and sending the rest tothe product backlog • Improving the product by implementing the metadata story during a single sprint, if possible • Assessing the outcome and continuing to identify new metadata problems and lacunae. • An iterative approach allows us to regularly improve metadata quality while continuously reevaluating our metadata priorities in response to stakeholders’ needs. 3 Results • Here are some lessons we’ve learned as a team: • The deeper you dig, the more you will find to improve. The quality improvement process is infinite. • Aim shallow rather than deep so that you cover all documents in a single sprint. • Keep a sharp focus—scope creep also affects metadata cleanup. • When merging metadata from different sources, watch out for inconsistencies (even when all validate against the same schema). • Don’t be parsimonious—if you think you might need some external metadata later, just build it into your document with a different namespace. • Visualizing metadata problems can help you to understand them more intuitively (see Fig. 3) • Aim to increase the percentage of computational approaches (hurray for XQuery!) with every sprint. • Set user expectations with respect to metadata flaws (see Fig. 1) 1 Background Fig. 3: An iterative cycle A primary challenge when developing a large scale digital library is balancing metadata quantity and quality. The Theological Commons is a digital library (built on a MarkLogic Server) with ~50,000 books (or ~16,000,000 pages). Since its release in March 2012, we have been iteratively improving its metadata quality using the Scrum process. Fig. 1: Metadata for a search result: example of framing user expectations 2 Methods Fig. 4: A scatter plot showing the est. distribution of errors in a book Scrum is a form of agile project management, which works in fixed iterations (“sprints”) to develop projects and features. As problems in metadata are identified, team members add them to the product backlog. The problems are described in story form—i.e. from the perspective of end users rather than technologists. The product owner takes responsibility for ordering these stories from most to least significant. Generally, our team tackles metadata problems using some combination of computational methods and hand editing, with a preference for the former (for obvious reasons). We seek to pass through all the data during a single sprint. At the end of the sprint, we review our work with the entire library staff. It’s not good enough to say that we’ve improved the metadata quality. We aim to demonstrate a new feature that exemplifies the improved quality of the metadata. 5 Questions • Here are some questions our team is thinking about: • Should we develop a test suite for our metadata to flag obvious errors (beyond validation)? • Are there reliable natural language processing toolkits to assist with improving automatically generated metadata (i.e. OCR)? • How can we best frame user expectations when dealing with metadata deficiencies? Thanks to the members of our digital team: Cortney Frank, Greg Murray, Donna Quick, and Chris Schwartz Fig. 3: The Theological Commons site (http://commons.ptsem.edu/)

Improving Metadata Quality: An Iterative Approach Using Scrum

Improving Metadata Quality: An Iterative Approach Using Scrum

Presentation Transcript

SQL Server 2008 for Developers

Improving Teacher Quality in Queensland

Chapter 5 Improving your writing: grammar and self-editing

IMPROVING QUALITY MANAGEMENT in the MINIMALLY-PROCESSED (MP) FRUITS and VEGETABLES INDUSTRY in Turkey

Iterative Project Management

Optimizing Iterative MapReduce Jobs

Psychodynamic Approach to Leadership

Making Metadata Work

Iterative Closest Point

Improving and Maintaining Voice Quality

Dublin Core and metadata: a tutorial

Introduction to Metadata

CM [A] R’s “MarLIN” Metadata System - or, how do we discover what data we’ve got??

Improving Health Care Quality and Reducing Costs through Payment and Delivery System Reform

4. RUP

Iterative Methods and Combinatorial Preconditioners

Metadata

IMPROVING QUALITY AND REDUCING COSTS: Redesigning Campus Learning Environments

Scrum