280 likes | 498 Views
Metadata and semantic web. Arto Vitikka Arctic Centre University of Lapland www.arcticcentre.org. Contents. Metadata introduction to metadata metadata on scientific data Open Archives Initiative examples Semantic web tools and technologies in Finland
E N D
Metadata and semantic web Arto Vitikka Arctic Centre University of Lapland www.arcticcentre.org
Contents Metadata • introduction to metadata • metadata on scientific data • Open Archives Initiative • examples Semantic web tools and technologies in Finland • introduction to semantic web technology • development work in Finland • examples
Introduction to metadata Sources used here: • Introduction to Metadata, Online Edition, version 3.0 , Tony Gill, Anne J. Gilliland, Maureen Whalen, and Mary S. Woodley, Edited by Murtha Baca. http://www.getty.edu/research/conducting_research/standards/intrometadata/ • Wikipedia • Data about data • Used in several domains: research, geographical information systems, libraries and social media (tags in Flickr, Del.icio.us)
Primary Functions of Metadata • Organization and description. A primary function of metadata is the description and ordering of original objects or items in a repository or collection, as well as of the information objects relating to the originals • Creation, multiversioning and reuse of information objects. Multiple versions of the same object may be created for preservation, research, exhibit and dissemination. Administrative and descriptive metadata should be included by the creator or digitizer, especially if reuse is envisaged. • Searching and retrieval. Good descriptive metadata is essential to users’ ability to find and retrieve relevant metadata and information objects. • Validation. To ascertain the authoritativeness and trustworthiness of the information.
Primary Functions of Metadata /2 • Utilization and preservation. Metadata on information objects related to user annotations, rights tracking, and version control may be created. Digital objects also need to be subject to a continuous preservation regime and undergo processes such as refreshing, migration, and integrity checking to ensure their continued availability and to document any changes that might have occurred to the information object during preservation processes. • Disposition. Metadata is a key component in documenting the disposition (e.g., accessioning, deaccessioning) of original objects and items in a repository, as well as of the information objects relating to those originals.
Benefits of structured metadata The more highly structured an information object is, the more that structure can be exploited for searching, manipulation, and interrelating with other information objects and systems. Then metadata: • certifies the authenticity and degree of completeness of the content; • establishes and documents the context of the content; • identifies and exploits the structural relationships that exist within and between information objects; • provides a range of intellectual access points for an increasingly diverse range of users; and • building of new services where integrating and reusing existing information sources
The Open Archives Initiative, Protocol for Metadata Harvesting - OAI-PMH • The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content. • The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low-barrier mechanism for repository interoperability. The essence of the open archives approach is to enable access to Web-accessible material through interoperable repositories for metadata sharing, publishing and archiving. • The OAI-PMH gives a simple technical option for data providers to make their metadata available to services, based on the open standards HTTP (Hypertext Transport Protocol) and XML (Extensible Markup Language).
Definitions • Data Provider: maintains one or more repositories (web servers) that support the OAI-PMH as a means of exposing metadata. • Service Provider: issues OAI-PMH requests to data providers and uses the metadata as a basis for building value-added services. A Service Provider in this manner is "harvesting" the metadata exposed by Data Providers • Harvesting: refers specifically to the gathering together of metadata from a number of distributed repositories into a combined data store.
Services and applications • The metadata that is harvested may be in any format that is agreed by a community. Dublin Core is specified to provide a basic level of interoperability. • Thus, metadata from many sources can be gathered together in one database, and services can be provided based on this centrally harvested, or "aggregated" data. • The link between this metadata and the related content is not defined by the OAI protocol.
Services and applications / 2 • OAI-PMH does not provide a search across this data, it simply makes it possible to bring the data together in one place. In order to provide services, the harvesting approach must be combined with other mechanisms. • OAI-PMH is technically very simple, but building coherent services that meet user requirements remains complex. • A number of software systems support the OAI-PMH: Fedora, GNU EPrints, Open Journal Systems, DSpace, DigiTool and MetaLib among others.
Open Archives in Finland • Doria contains digital collections of Finnish universities and polytechnics. • University of Lapland is now starting to implement open archives system, integrated into Doria. • Work starts with master's thesis, later on the publications of the staff and the Lapland University Press. • https://oa.doria.fi/
Examples • Map of OA repositories: http://maps.repository66.org/ • Registry of Open Access Repositories - http://roar.eprints.org/ • The aim of ROAR is to promote the development of open access by providing timely information about the growth and status of repositories throughout the world. • Arctic Open Archives application to serve the UArctic and the arctic science community?
More information Sources used here and more information: • Open Archives Forum - http://www.oaforum.org/tutorial/ • The Open Archives Initiative Protocol for Metadata Harvesting - http://www.openarchives.org/pmh/
Metadata on research data • Description of research data • Answers to questions: who, what, where, when, how and how to obtain the data • International metadata standard: • Directory Interchange Format (DIF) • used in Global Change Master Directory • Required, Highly recommended and Recommended fields • title, parameters, data center, summary, personel, instrument, resolution, temporal and spatial coverage, etc.
Data portals • Global Change Master Directory (GCMD) • maintained by NASA • Earth science data sets and services relevant to global change • more than 30 000 descriptions on data and services • gcmd.nasa.gov • Antarctic Master Directory • part of GCMD • about 6 400 data descriptions (3.3.2010) • national Antarctic data portals • IPY Metadata Portal • part of GCMD • 363 descriptions (3.3.2010)
Benefits of metadata • Facilitate access to data and maximise the use of data • Avoid duplication of research and data collection • Improve efficiency of scientific data management • Facilitate new research through access to existing scientific data • Improve cooperation and interoperability between disciplines • Data may be valued more than the immediate publications it has generated • Scientists cannot be expected to know how their data may be used in the future
Semantic web • The Semantic web - the Internet of meanings - is the next generation of the Internet. • The idea of the semantic web is to make content understandable for machines by binding it to some formal and meaningful description. • Enables user communities to put machine-understandable contents on the web which can be shared and processed both by automated tools and people. • Integration and reuse of the information in new unforeseeable applications and domains is possible.
Semantic web / 2 • Ontologies are the infrastructure of the semantic web. • Ontologies serve to make metadata understandable by computers, they define the way descriptive terms are interrelated and used in a given domain of interest. • Semantic web concept makes finding the correct data and information more effective, also ensuring the validity of the information and enabling language independence. • For example when talking about Nokia - town, rubber boots, car tires,the Nokia company or a Nokia phone? • Or ‘Paris’ in a web page tells the computer explicitly that in this context the information is about town Paris, Texas, US
Semantic web / 3 • The development of the Semantic Web started about ten years ago • European Commission has funded related research and development projects. • The Semantic computing research group at the Aalto University has conducted several years Semantic Web technology development projects • Variety of Semantic Web infrastructure services like the Finnish Ontology Library Service and open source semantic tools for creating applications. • Now we are at a state where the Semantic Web is moving from being a vision to becoming reality.
Semantic web in Finland Services Finnish Ontology Library Service ONKI http://www.yso.fi/ The ONKI service contains Finnish and international ontologies, vocabularies and thesauri needed for publishing your content cost-efficiently on the Semantic Web. Ontologies are conceptual models identifying the concepts of a domain. They contain machine "understandable" descriptions of the relations between the concepts.
Semantic web in Finland /2 • Finnish General Upper Ontology (YSO) with ca. 20 000 concepts • Besides general ontology there are several special ontologies • Ontologies have been created either based on existing vocabularies or from scratch • The Finnish General Upper Ontology has been made available for users (ontology developers, content indexers, information search) by setting up ontology server and providing applications for integrating the ontology into existing content management systems • http://www.seco.tkk.fi/ontologies/
Semantic web open source tools Semantic Portal Building Tools • Lightweight multifaceted search engine for RDF data • Browser-based semantic annotation tool • Tool for Creating Semantic View-Based Search and Browsing Portals • Generic View-Based RDF Search Engine • A tool for creating static web sites based on semantical content. Semantic Information Extraction • A framework for automatic annotation • Automatic Information Retrieval Ontologically
Ontology services • National Ontology Service ONKI • Ontology repository • Ontology server for publishing vocabularies • Ontology Service for Geographical Data • Ontology Service for Finding People and Organizations • Ontology-based Annotation Assistant
Applications CultureSampo • Semantic web portal and a publication channel for Finnish cultural heritage. • Contents comes from over 20 different Finnish museums, libraries, archives and other source, as well as from the Getty Foundation and Wikipedia. • The system aggregates cross-domain content of various kinds including artifacts, paintings, scuplture, drawings, abstract art, novels, comics, web pages, folklore and runes of different kinds, fictive persons and places, folk music, photos, persons, organizations, historical events, videos, buildings, and cultural sites.
Applications / 2 TerveSuomi - HealthFinland - Demo • Metadata, ontology, and service infrastructure - based on W3C semantic web recommendations, a domain-specific metadata schema (Dublin Core application), and a set of ontologies and services provided within the National Ontology Service. • Semantic content creation process - for producing semantically annotated contents, based on the shared metadata model and ontologies. • Semantic portal HealthFinland - material is published via a semantic portal that creates a single national entry-point for health information, health promotion and health-related news. • The information is collected from a diverse group of sources including expert organizations, governmental institutes and NGOs. • A quality control process
Conclusions • Metadata enables the creation of new intelligent web services • Reuse and integration of information • Standards and tools are existing • Open source tools does not mean that they are free • Building services still requires lots of work and a good funding • Tools to build services for the arctic communities
Kiitos paljon! Tack så mycket! Thank You!