1 / 24

The Economy of Distributed Metadata Authoring

The Economy of Distributed Metadata Authoring. by Stefano Mazzocchi. Experts' Workshop - Perspectives on networked knowledge spaces 25/26 October 2002, Sankt Augustin, Germany Organised by: MARS Exploratory Media Lab at the Fraunhofer Institut für Medienkommunikation. What is Metadata.

tave
Download Presentation

The Economy of Distributed Metadata Authoring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Economy of Distributed Metadata Authoring by Stefano Mazzocchi Experts' Workshop - Perspectives on networked knowledge spaces 25/26 October 2002, Sankt Augustin, Germany Organised by: MARS Exploratory Media Lab at the Fraunhofer Institut für Medienkommunikation

  2. What is Metadata • Metadata is information about information

  3. Classic Examples of Metadata • Keywords • Author • Date of creation/modification • Address/Identifier

  4. More provocative examples • Punctuation in text • Layout on a page • Font size/weight/style in text • Commentary audio tracks on DVD

  5. General Metadata Properties • Metadata is data about data, but it’s still data • Metadata should be semantically orthogonal: data should be understandable even without metadata

  6. Markup and Metadata • Markup languages can be seen as metadata-driven languages. • Markup syntax is designed to keep data and metadata orthogonal

  7. The Importance of Metadata • Key to semantic analysis • Key to multidimensional augmentation of information • Key to information relationability • In short: key to more powerful datamining

  8. Types of Metadata • Human authored • Automatically Inferred

  9. Human Authoring (1) • In-process: data and metadata are created at the same time • Out-of-process: metadata is added after data has been created

  10. Human Authoring (2) • By the data author: data and metadata are written by the same person • By another author: data and metadata are created by different people

  11. Automatic Inference • Recogniction of patterns and trends in data • Semantic assumption of data-metadata correlations

  12. Types of Automatic Inference • Heuristic: some algorithm performs analysis on the data set (artificial reproduction of intelligent behavior) • Transparent: some mechanically extracted information is transparently associated with some metadata performed by human semantic analysis

  13. Transparent Inference Examples • Google’s PageRank • Amazon’s related items • NEC’s CiteSeer

  14. Google • PageRank is the system that ranks the pages found after a query against their database • It works on hyperlink topology analysis • Metadata is inferred from the hyperlinks contained into the page

  15. Amazon • Relation between items is inferred from the analysis of the articles bought by the other users • The act of a user buying two products is assumed to be a sign of relation between the items • Simply by buying, the users are collectively filling up product metadata on relations

  16. CiteSeer • Digital Library of IT papers • Ranks searches on ‘citations’ topology analysis • Bibliographies become the source of relevance metadata

  17. The Issues with Metadata • Quality of metadata heavily influences the quality of all search/retrieval systems

  18. First Law of Metadata Quality • Artificial intelligence is just that: artificial! • So: for a system that feels smart to humans, you need human-created metadata

  19. First Law of Metadata Quantity • The more high-quality metadata, the better. • But: the more human-created metadata, the more expensive the authoring process gets.

  20. Metrics • In order to estimate the value of proposed technological solutions, a metric is required • Economical feasibility is one possible metric

  21. Consequences • All current markup-based semantic web solutions (RDF, topic maps, ontologies) are economically infeasible. • The best semantic solutions are those based on transparent inference

  22. Suggestions (1) • Plan the impact of metadata authoring costs on technology decisions. • Don’t underestimate the importance of user feeling. • Think about what can be inferred transparently without requiring heuristics

  23. Suggestions (2) • Do all efforts to make instant return on the investment of metadata authoring • Don’t ask too much • Be smart but not smarter

  24. Thanks!

More Related