1 / 16

Metadata Quality for Federated Collections

Metadata Quality for Federated Collections. Besiki Stvilia, Les Gasser, Mike Twidale, Sarah Shreeves, Tim Cole. GSLIS, UIUC November, 2004. 1. Abstract.

kirti
Download Presentation

Metadata Quality for Federated Collections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata Quality for Federated Collections Besiki Stvilia, Les Gasser, Mike Twidale, Sarah Shreeves, Tim Cole GSLIS, UIUC November, 2004

  2. 1. Abstract • Centralized metadata repositories attempt to provide integrated access across multiple digital collections from libraries, archives and museums. Metadata quality in these repositories heavily influences the collections' usability---high quality can raise satisfaction and use, while low quality can render collections unusable. Individual metadata type, origin and quality variances are compounded into complex quality challenges when collections are aggregated. Current metadata quality assurance is generally piecemeal, reactive, ad-hoc, and a-theoretical; formal compatibility and interoperability standards often prove unenforceable given metadata providers' dynamic and conflicting organizational priorities. We are empirically examining large bodies of harvested metadata to develop systematic techniques for metadata quality assessment/assurance. We study metadata quality, value, and cost models; algorithms for connecting metadata component variations to (aggregate) metadata record quality; and prototype metadata quality assurance tools that help providers, aggregators and users reason about metadata quality, doing more intelligent selection, aggregation and maintenance of metadata.

  3. activity Metadata Quality OutcomeQuality OutcomeValue information use 2. Approach The model has been developed using a number of techniques such as literature analysis, case studies, statistical analysis, strategic experimentation, and multi-agent modeling. The model along with the concepts and metrics used can serve as a foundation for developing effective specific methodologies of quality assurance in various types of organizations. Our model of metadata quality ties together findings from existing and new research in information quality, along with well-developed work in information seeking/use behavior, and the techniques of strategic experimentation from manufacturing. It presents a holistic approach to determining the quality of a metadata object, identifying quality requirements based on typified contexts of metadata use (such as specific information seeking/use activities) and expressing interactions between metadata quality and metadata value.

  4. A c t i v i t y MM MM MM Model Schema / Genre II II II 3. Measuring Metadata Quality3.1 Metadata Quality Problem • Actual qualitynot matching Required/needed level of quality • May arise at different levels • Element Level • Schema Level • Quality Dimensions

  5. Intrinsic Accuracy Cohesiveness Complexity Semantic-consistency Structural-consistency Currency Informativeness Naturalness Precision 3.2 Information Quality Dimensions • Relational / Contextual • Accuracy • Completeness • Complexity • Latency • Naturalness  • Informativeness • Relevance (aboutness) • Precision • Security • Verifiability • Volatility • Reputational • Authority

  6. completeness vs. simplicityrobustness vs. simplicityvolatility vs. simplicityrobustness vs. redundancy accessibility vs. certainty … Taguchi curves help to model and reason about tradeoffs. Q’ Q Q Q Q’’ Q’’ Q’ Q’’ LIB SIB Q’ NIB 0 0 0 qi’ qi’ qi’ qi’’ qi’’ qin qi’’ qi qi qi 3.3 MQ Dimensions may trade off

  7. [local context] MD Genre Culture/Activity Collection Metadata MQ Assessment MQ Metrics Weights 3.4 Genre Captures Context

  8. 4. Measuring Value4.1 What’s the Value of Quality?

  9. 4.2 Value as Amount of Use • The value of metadata can be a function of the probability distribution of the operations/transactions using the metadata. • Human factors experiments can be used for assessing the effectiveness of creating and using the metadata. • Metadata often is an organizational asset, especially in organizations like libraries and one can calculate its dollar cost based on the average time a cataloger spends on creating a record or an element of the record..

  10. 5. IMLS Digital Collections and Content Project • Promote centralized search, interoperability and reusability of metadata collections • Harvested metadata from >20 data providers, >150,000 Dublin Core Records (and growing) • Data providers: small public libraries and historical societies; large academic libraries; museums; research centers • Records provided: from dozens to tens of thousands • Interoperability and reusability require negotiation of Global quality http://imlsdcc.grainger.uiuc.edu

  11. Ptolemaios son of Diodoros Dioskoros Ptolemaios Dioscorus. Ptolemaios (variant transliteration) <date>2000</date> <date>1998-03-26</date> (ambiguous and structurally inconsistent) <publisher>New York: Robert Carter, 1846</publisher>(schema limitation led to workaround) . . . Activity: Find & Collocate Actions: Find Identify Select Obtain Across Federated Collections 5.1 Examples of Quality Problems

  12. 5.2 Findings MQ dimensions with major quality problems: • completeness • redundancy • clarity • semantic inconsistency (incorrect element use) • structural inconsistency • inaccurate representation

  13. 5.3 Findings Correlation between consistency of element use and type of metadata objects and type of data providers (sample size 2,000). • Grouping by type of objects made standard deviation of total number of elements used drop significantly (from 5.73 to 3.6) • Clustering by use of distinct DC elements (K-means, with 2 clusters) suggested that different types of institutions may use different number of distinct DC elements: • Academic libraries – 13 • Public libraries – 8 • Museums - divided

  14. 5.3 Findings • High complexity of metadata content related to # quality problems • Strong correlation found between Content Simplicity/Complexity Rate and Quality Problem Rate (-.434, p<.01) • However, no significant correlation found between Quality Problem Rate and Length of Metadata Object (.043) • Differences in how well standard schemas handle different types of original objects - lowest quality problem rate found for print materials:

  15. 6. Conclusions and Lessons Learned • Communities of practice may use their own implicit or explicit schema when sharing metadata even through a standardized schema such as DC • Some schema elements can be more ambiguous than others and require qualification: Date vs. Creator • Ambiguity of schema elements can be major source of quality problems leading to context loss and element misuses • Inferring native schema and comparing it to destination schema can point to possible sources of quality problems • Analysis of activities can help in evaluating Robustness and Clarity of schema • Mining regularities between metadata characteristics and quality problems can help in constructing robust and inexpensive metrics • Some metrics used in Information Retrieval (Infonoise, Kulback-Liebler, Average IDF) can be effective and scalable in assessing quality at the content level • A general purpose dictionary-based metric found robust for assessing cognitive complexity of metadata content • Structure profiles can be effective source for measuring quality and predicting quality problems at the schema level

  16. Acknowledgements and Contact Information The research was made possible by the generous support from the Institute of Museums and Library Services (IMLS) and the UIUC Campus Research Board. How to contact Email Besiki Stvilia at stvilia@uiuc.edu

More Related