240 likes | 367 Views
The Economy of Distributed Metadata Authoring. by Stefano Mazzocchi. Experts' Workshop - Perspectives on networked knowledge spaces 25/26 October 2002, Sankt Augustin, Germany Organised by: MARS Exploratory Media Lab at the Fraunhofer Institut für Medienkommunikation. What is Metadata.
E N D
The Economy of Distributed Metadata Authoring by Stefano Mazzocchi Experts' Workshop - Perspectives on networked knowledge spaces 25/26 October 2002, Sankt Augustin, Germany Organised by: MARS Exploratory Media Lab at the Fraunhofer Institut für Medienkommunikation
What is Metadata • Metadata is information about information
Classic Examples of Metadata • Keywords • Author • Date of creation/modification • Address/Identifier
More provocative examples • Punctuation in text • Layout on a page • Font size/weight/style in text • Commentary audio tracks on DVD
General Metadata Properties • Metadata is data about data, but it’s still data • Metadata should be semantically orthogonal: data should be understandable even without metadata
Markup and Metadata • Markup languages can be seen as metadata-driven languages. • Markup syntax is designed to keep data and metadata orthogonal
The Importance of Metadata • Key to semantic analysis • Key to multidimensional augmentation of information • Key to information relationability • In short: key to more powerful datamining
Types of Metadata • Human authored • Automatically Inferred
Human Authoring (1) • In-process: data and metadata are created at the same time • Out-of-process: metadata is added after data has been created
Human Authoring (2) • By the data author: data and metadata are written by the same person • By another author: data and metadata are created by different people
Automatic Inference • Recogniction of patterns and trends in data • Semantic assumption of data-metadata correlations
Types of Automatic Inference • Heuristic: some algorithm performs analysis on the data set (artificial reproduction of intelligent behavior) • Transparent: some mechanically extracted information is transparently associated with some metadata performed by human semantic analysis
Transparent Inference Examples • Google’s PageRank • Amazon’s related items • NEC’s CiteSeer
Google • PageRank is the system that ranks the pages found after a query against their database • It works on hyperlink topology analysis • Metadata is inferred from the hyperlinks contained into the page
Amazon • Relation between items is inferred from the analysis of the articles bought by the other users • The act of a user buying two products is assumed to be a sign of relation between the items • Simply by buying, the users are collectively filling up product metadata on relations
CiteSeer • Digital Library of IT papers • Ranks searches on ‘citations’ topology analysis • Bibliographies become the source of relevance metadata
The Issues with Metadata • Quality of metadata heavily influences the quality of all search/retrieval systems
First Law of Metadata Quality • Artificial intelligence is just that: artificial! • So: for a system that feels smart to humans, you need human-created metadata
First Law of Metadata Quantity • The more high-quality metadata, the better. • But: the more human-created metadata, the more expensive the authoring process gets.
Metrics • In order to estimate the value of proposed technological solutions, a metric is required • Economical feasibility is one possible metric
Consequences • All current markup-based semantic web solutions (RDF, topic maps, ontologies) are economically infeasible. • The best semantic solutions are those based on transparent inference
Suggestions (1) • Plan the impact of metadata authoring costs on technology decisions. • Don’t underestimate the importance of user feeling. • Think about what can be inferred transparently without requiring heuristics
Suggestions (2) • Do all efforts to make instant return on the investment of metadata authoring • Don’t ask too much • Be smart but not smarter