350 likes | 677 Views
Publishing Cultural Heritage? Semantic Multimedia Content Management . Barnabás Szász – University of Debrecen, Hungary. Situation. Big amount of non-digital resources Digitalization is the first step, not the solution Tools are needed for Digitalizing/creating multimedia documents
E N D
Publishing Cultural Heritage?Semantic Multimedia Content Management Barnabás Szász – University of Debrecen, Hungary
Situation • Big amount of non-digital resources • Digitalization is the first step, not the solution • Tools are needed for • Digitalizing/creating multimedia documents • Managing documents • Describing/annotating documents • Publishing documents • Browsing published documents • Finding relevant documents
Digital (Heritage) Resource Life-cycle Access Portal technology Searching/Browsing support tools eLearning modules Description Metadata Ontologies Semanticannotations Acquisition Digitisation Purchase Creation
Content Management • Content management, is a set of processes and technologies that support the evolutionary life cycle of digital information. This digital information is often referred to as content or, to be precise, digital content. Digital content may take the form of text, such as documents, multimedia files, such as audio or video files, or any other file type which follows a content lifecycle which requires management.
Challenges on management side • What features and functionality do CMS offer? • In-context contribution, preview, updates and approvals • Template-based pages with reusable components • Native content conversion to Web formats (HTML, XML, WML, etc.) • WYSIWYG editor for form-based content • Dynamic delivery and scheduled publishing models • Web content release and expiration • Additional content management functionality required • Library services (search, check in/out, version control, subscription) • Flexible metadata and security • Archiving and replication
Successful Pilot Project • Museum 24 – Semantic Virtual Museum (http://www.museo24.fi) • Topics are covering • Architecture and buildings • History of work – local forestry and aviation industry • Museums • Local news in the past • Old maps • Memories • Old photos The story of Jämsä region (Central Finland)
Multimedia Content • Dealing with huge files • Administration interface for different media formats • Image manipulation • Video editing / annotation • Additional technology • Streaming • File format / attribute conversation
Metadata • Metadata is structured, encoded data that describe characteristics of information-bearing entities to aid in the identification, discovery, assessment, and management of the described entities. • Used to facilitate the understanding, use and management of data • Always has a name and value
Examples of Metadata • Scientific materials: Abstract, Keywords, Title, Authors of a paper • Geospatial metadata • Audio: ID3 tags of mp3 files • Image: EXIF tags of JPEG files • File System: permissions, file attributes, etc…
Ontology • An ontology is a shared conceptualization of a domain • An ontology is a set of definitions in a formal language for terms describing the world The ice-cream ontology
Thing Entity Event Living being Inert entity Human Document Man Woman Book Novel Short story Ontology vs. taxonomy taxonomy:a classification based on similarities.
CH4 C2H6 CH3-OH C2H6-OH etc. methane ethane methanol ethanol O3 H2 -OH CO2 H2O O2 -CH3 ozone carbon dioxide dioxygen water phenol dihydrogen methyl C O H carbon oxygen hydrogen Ontology vs. Partonomy • partonomy:a classification based on part-of relation.
Logical theory of Ontologies • formal definitions (knowledge factorization) director (x) person(x) ( y organization(y) manage (x,y)) • causal relations living_being(y) salty(x) eat (y,x) thirsty(y) • An ontology is not a taxonomy.A taxonomy may be an ontology.
CIDOC-CRM • The CIDOC CRM is a domain ontology • It approximates relevant expert conceptualizations underlying major documentation and metadata formats of material cultural heritage and beyond. • It provides an extensible vocabulary and an extensible structure about possible states of affairs as a unique reference in order to transform and merge data of different structures and to drive queries (mediation). • It is not a fusion of existing formats, but a product of expert insight and intensive interdisciplinary work. As such, it builds on a metaschema/metamodel of fundamental categories and causal relationships with explanatory power.
CRM Structures • Classes: The class hierarchy contains the conceptual building-blocks of the CRM. There is an is_a relationship between sub-classes and super-classes: Activity is an (ISA) Event. • Properties: These provide the specific relationships between the classes. It acts like a verb, demanding both domain and range, and is bi-directional: Physical Man-Made Stuff depicts CRM Entity CRM Entity is depicted by Physical Man-Made Stuff • Inheritance: Subclasses inherit the properties from their super-classes. Multiple inheritance means that a sub-class may have more than one super-class (in which case, it inherits the properties of all its parents).
MPEG-7 extension Requirements for the underlying Ontology: • all the multimedia metadata should be stored in the ontology model • An ontology model was needed, which • is suitable for describing the cultural heritage information • supports describing multimedia • Jane Hunter: Combining the CIDOC CRM and MPEG-7 to Describe Multimedia in Museums
Named Entity Recognition • Named entity recognition is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. • Used for Semi-Automatic Annotation • Parsing documents for seeking known names • Named Entity sources • ULAN – Union List of Artist Names (ULAN, Getty) • AAT – Art & Architecture Thesaurus (AAT, Getty) • TGN – Getty Thesaurus of Geographic (TGN, Getty) • Wikipedia entries
Collaborative Ontology Engineering • Named Entities Collection • Recognition • During the annotation process • New Entities are created mostly at this step • Pending Term description • Needs domain expert • Relations are created at this step
Basic Ontology Editor – Annotation Tool • Reduced view of the CIDOC ontology Who? What? Where? When?
How AJAX works? • AJAX stands for Asynchronous JavaScript and XML • Based on the XMLHTTP/XMLHttpRequest object • Refresh part of the page • Communicate with the server
LiveSearch • How does it work? – AJAX based • Why we need LiveSearch? aim: build and use controlled vocabularies
Search • Keyword based • Full Text and Ontology Index • Structured Metadata • Faceted Navigation • AJAX based tools • Semantic Autocompletition
Faceted Navigation A faceted classification system allows the assignment of multiple classifications to an object, enabling the classifications to be ordered in multiple ways, rather than in a single, pre-determined, taxonomic order.
Ontology supportedsearch • Searching in the ontology • Displaying related content
Search Query Sample SELECT ?x FROM <http://imnetti.fi/vmuseum/museum24> WHERE (?x, rdf:type, mpeg-7:Image), (?x, cidoc:shows_visual_item, ?y), (?y, rdf:type, cidoc:Man-Made_Object), (?y, cidoc:has_type, 'church'), (?y, cidoc:is_located_on, ?z), (?y, cidoc:beginning_is_qualified_by, ?year) (?z, cidoc:borders_with, ‘Budapest'), AND ?year < 1900 USING rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#> cidoc FOR <http://imnetti.fi/vmuseum/cidoc#> mpeg-7 FOR <http://imnetti.fi/vmuseum/cidoc-mpeg7#>
Semantic Timeline Ontologyindividuals 1880-90 1890-1900 1900-10 1910-20 1920-30 WW I.
Semantic Map Ontologyindividuals Is about located at
Summary • Discussed theories and technologies • Content Management • Metadata • Ontologies • Annotation • AJAX • Demonstrated tools • LiveSearch • Ontology Editor • Public Search • Intelligent Folders • Semantic Timeline • Semantic Map
Resources • Crofts N., Doerr M., Comprehensive Introduction to CIDOC CRM , ICOM/CIDOC DSG, web presentation, 2003. http://cidoc.ics.forth.gr/comprehensive_intro.html • Hunter J., Combining the CIDOC CRM and MPEG-7 to Describe Multimedia in Museums, DSTC Pty Ltd., University of Queensland, Australia, 2002. • Szász B., Cultural Heritage on the Semantic Web – the Museum24 project, to be published
Merci de votre attention. Thank You for Your attention. Questions? Barnabás Szász bszasz@gmail.com www.museo24.fi