510 likes | 522 Views
Explore the data mapping and conversion workflow from MARC to BIBFRAME in the SHARE Catalogue project. Discover the benefits of linked data technology and the integration of bibliographic datasets from multiple universities.
E N D
2019-03-06 ESSnet Hackathon in Rome The Libraries in a Linked Open Data environment The data mapping and the conversion workflow from MARC to BIBFRAME Tiziana Possemato @Cult
Universities share http://sharecampus.it Convenzione interuniversitaria: http://www.sharecampus.it/media/b80db-convenzione_interuniversitaria.pdf Carta dei servizi condivisi: http://www.sharecampus.it/media/4f255-carta_servizi.pdf SHARE Catalogue: http://www.sharecampus.it/main/static_page/share_catalogue http://catalogo.share-cat.unina.it/sharecat/clusters SHARE Discovery: http://www.sharecampus.it/main/static_page/share_discovery http://campania-primo.hosted.exlibrisgroup.com/primo_library/libweb/action/search.do?vid=39CAMP_V1&wroDevMode=true SHARE Press: http://www.sharecampus.it/main/static_page/share_press
SHARE Catalogue came into being as part of the SHARE project (Scholarly Heritage and Access to Research), within an inter-university agreement tailored specifically for library services and implemented in a three-year program (2013-2015) for the creation of an integrated system to allow the use and management of resources among: Università degli Studi Suor Orsola Benincasa Università degli Studi del Sannio Università degli Studi di Napoli L'Orientale CAPOFILA Università degli Studi di Napoli Federico II Università degli Studi di Salerno Università degli Studi di Napoli Parthenope Università degli Studi della Campania Luigi Vanvitelli Università degli Studi della Basilicata
SHARE collaborators: • Università della Basilicata: Prof. Maurizio Martirano, Dott.ssa Antonella Trombone, Sig. Franco Claps • Università degli Studi di Napoli Federico II: Prof. Roberto Delle Donne, Dott.ssa Maria Grazia Ronca, Ing. Giovanni Barone, Dott.ssa Stefania Castanò, Dott.ssa Paola Denunzio, Ing. Amerigo Izzo, Dott.ssa Valeria Locastro, Dott. Nicola Madonna, Dott.ssa Anna Tafuto • Università degli Studi di Napoli L’Orientale: Prof. Francesco Sferra, Dott. Mario Vitalone, Dott.ssa Stefania Marchi • Università degli Studi di Napoli Parthenope: Prof. Riccardo Marselli, Dott.ssa Rosa Maiello, Sig. Gabriele Saurini, Dott. Nunzio Napolitano, Sig.ra Antonietta Cutillo, Dott. Giovanni Mormile • Università degli Studi di Salerno: Prof.ssaDaniela Valentino, Dott. Marcello Andria, Dott. Isidoro D’Auria, Dott.ssa Patrizia De Martino, Sig. Salvatore De Filippis, Ing. Massimiliano Cilurzo • Università degli Studi del Sannio: Prof. Francesco Paolo Mancini, Dott.ssa Loredana Cerrone, Prof. Nicolino D’Ortona, Prof. Ciro Visone
LINKED DATA e SHARE Catalogue The philosophy behind linked data technology provides the starting point from which our aim can be reached. Opendata are data providedbyinstitutions and madefreelyavailableforconsultation; the onlyconditionsforre-use are the obligationtocitesources the respectfor the integrityof the data.
Legislative decreeof 18 May 2015, n. 102, modifying the previous legislative decreeof 24 January 2006, n. 36 The decision, taken by the Universities in the Convention, to provide free access to their bibliographic datasets refers to the rules on the re-use of information produced by or owned by the public sector. The Legislative Decree of 2006 and, subsequently, that of 2015 have extended their scope to public libraries, archives and museums and strengthened their obligations to facilitate the search and retrieval of information using appropriate metadata and systems that comply with open data standards through the use of new information and communication technologies.
There are multiple and exponentialadvantages: - from the reductionofduplicationof the information; - to the possibilityofsharingforanefficientuseofresources; • to the abilitytosupply high quality data that can bere-used. Linked data technology makes it possible to integrate data from different contexts and use them unambiguously, based on recording techniques that allow re-use.
Within the project, analysis of the data was carried out starting from the MARC format, the primary source of information due to the amount of datait contains and to the high level of semanticspresent: each element was analyzed to identify its use in the record and in the catalogue as a whole. The traditional record in MARC format, transformed into RDF, has been deconstructedinto a set of data in which every single element has acquired meaning by its arrangement with other data and the generation of reusable assertions from different communities that operate on the web.
Libraries, which have always produced quality and authoritative data in highly structured bibliographic records, responding to shared and widespread rules, now more than ever assume the role of "quality generators" for the network. We are moving towards a new evolutionary stage that will see a radical transformation taking place: • of the catalogue and bibliographic data; • of the relationship between user and catalogue; • of the relationship between the catalogue and the larger information sphere.
BIBFRAME • The SHARE Catalogue data model
BIBFRAME – Bibliographic Framework Initiative The Bibliographic Framework as a Web of Data: Linked Data Model and Supporting Servicesdocument published by the Library of Congress on November 21, 2012, sets out a new data model designed as an evolution, in linked open data, of the Marc 21 format. The reflections on the new cataloguing rules focus on some specific points, including: • a greater level of identification and analysis of the data; • greater attention to controlled vocabularies; • more widespread use of terms instead of codes; • emphasis on relationships; • greater flexibility in controlled items.
BIBFRAME – Data model v. 2-0 “In translating the MARC 21 format to a Linked Data model it is important to deconstruct and then reconstruct the informational assets that comprise MARC”. The BIBFRAME Model, version 2.0 (published on 2016, 21th of April) consists of the following core classes: • Work:The highest level of abstraction, a Work, in the BIBFRAME context, reflects the conceptual essence of the cataloged resource: authors, languages, and what it is about (subjects). • Instance: A Work may have one or more individual, material embodiments, for example, a particular published form. These are Instances of the Work. An Instance reflects information such as its publisher, place and date of publication, and format. • Item: An item is an actual copy (physical or electronic) of an Instance. It reflects information such as its location (physical or virtual), shelf mark, and barcode.
BIBFRAME – Data model v. 2-0 BIBFRAME 2.0 further defines additional key concepts that have relationships to the core classes: • Agents: Agents are people, organizations, jurisdictions, etc., associated with a Work or Instance through roles such as author, editor, artist, photographer, composer, illustrator, etc. • Subjects: A Work might be “about” one or more concepts. Such a concept is said to be a “subject” of the Work. Concepts that may be subjects include topics, places, temporal expressions, events, works, instances, items, agents, etc. • Events: Occurrences, the recording of which may be the content of a Work
BIBFRAME – The vocabolary* The BIBFRAME Vocabularyis comprised of the RDF properties, classes, and relationships between and among them: • Classesinclude the three core classes (Work, Instance and Item) as well as various additional classes, many of which are subclasses of the core classes. • Properties describe characteristics of the resource being described as well as relationshipsamong resources. For example: one Work might be a “translation of” another Work; an Instance may be an “instance of” a particular BIBFRAME Work. Other properties describe attributes of Works and Instances. For example: the BIBFRAME property “subject” expresses an important attribute of a Work (what the Work is about), and the property “extent” (e.g. number) expresses an attribute of an Instance.
The BIBFRAME • threelayersPortal
1 layer: Person/Works http://catalogo.share-cat.unina.it/sharecat/searchNames?n_cluster_id=5520
http://catalogo.share-cat.unina.it/sharecat/searchNames?n_cluster_id=167993http://catalogo.share-cat.unina.it/sharecat/searchNames?n_cluster_id=167993
http://catalogo.share-cat.unina.it/sharecat/searchTitles?t_cluster_id=21962http://catalogo.share-cat.unina.it/sharecat/searchTitles?t_cluster_id=21962
2 layer: Instances or publications The level of the Instances, which can be associated more generically to the publications, is realized using the data of the catalogues which, through appropriate conversion and matching processes, are linked to the upper level through the titles of the Works present. The bibliographic data are indexed in the SOLR search engine, which makes it possible to produce different data aggregations (eg. publication date, language, publisher, edition, etc.), providing a wide range of search and navigation functions.
Example for Promessi sposi * Clicking result on the title Promessi sposi
3 layer: Items • In the third level, the Portal integrates with local systems; in fact the records relating to publications from the second level are linked to information on copies and their availability in the individual catalogs. Libraries do not give up the specifics of their OPACs but rather take advantage of them.
The mainareas of ourprojects: Enrichmentof MARC record with URIs Conversion from MARC to RDF using the BIBFRAME vocabulary (and otheradditionalontologiesasneeded) Data publicationaccording to the BIBFRAME data model Batch/automateddata updatingprocedures Batch/automateddata disseminationto libraries Our BIBFRAME projectsoverallgoals
The SHARE projectsheart: Entityidentification, Reconciliation, Data enrichment and BIBFRAME Conversion 30
The new revolution: from record to entity Shakespeare, William, 1564-1616 Шекспир, У. 1564-1616 Уильям Saixpēr, Gouilliam, 1564-1616 As you like it [print] As you like it As you like it [on-line] Come ti piace Comme il vous plaira Cambridge University Press Cambridge Press Fathers and daughters Padri e figlie Cambridge Univ. Press Pères et filles
How reconciliationisobtained Scope of these processes is to bring together and to make data available from different sources in a way that could be defined as democratic to better identify the entity in question. Data reconciliation and enrichment is obtained by: • automatedprocesses • manualprocesses It is important to underline how the relationship between the reconciliation and validation of the resultscan differ greatly between the automated and manual processes: • automated processes: a high level of reconciliation and clustering; a low level of result validation; • manual processes: a low level of reconciliation and clustering; a high level of result validation. 32
Entities in cluster : an example of collaboration and sharing The result of a reconciliation of the entity Antonio Vivaldi in the Share VDE project, with data from different sources and projects: • the authorized form from a local authority file • the variant forms originating from the references on the local authority records • the variant forms originating from the VIAF • the forms of the name used in the bibliographic records. The cluster is completed and enriched with identifiers for the same entity, Antonio Vivaldi, from sources such as: • Wikidata • Library of CongressName Authority File • Data.bnf.fr • VIAF
An example of Work/Instancesreconciliation Grouping under a single work title of the many publication titles in the catalogue for Cimentodell’armonia e dell’inventione Single work title Brings together different publications/resources present in different catalogues.
BIBFRAME projects • Processoverview
The SHARE projects processes Marc enriched/URIs OliSuite: manual process Database of relationships Knowledge base of clusters Lodify RDF/Bibframe dataset Dump db APIs External sources SHARE-VDE Portal
Focus on processes 1/2 Bibliographic records Authority records BIB1 BIB1 BIB … BIB … BIB2 BIB2 Marc enriched (.pxml) CLUSTERS KNOWLEDGE BASE BIB… BIB1 BIB2
Focus on processes 2/2 CLUSTERS KNOWLEDGE BASE BIB1 Marc enriched (Binary) (one for LIB) BIB1 Lodify Marc enriched (.pxml) BIB2 BIB2 Stardog BIB… Clusters Knowledge base BIB… BIB1 BIB2 BIB… External (VIAF) URIs SHARE-VDE URIs
The SHARE technologyheart: the LOD Platform
The LOD Platform The LOD Platform is a highly innovative technological system, for handling bibliographic catalogues and transforms them in Linked Open Data.The LOD Platform uses BIBFRAMEas its ontology, but is able to combine and add all other ontologies and data models required by each specific project. The LOD Platform is developed and maintained by @Cult, a software company based in Rome, particularly focused on Linked data projects.
The LOD Platform • The systemallows: • data analysis and management, to identify and group (clusterizationprocess) the entities; • data enrichment through links with external projects; • bibliographical and authority data conversion, according to the standard models foreseen by the W3C for LOD, RDF (Resource Description Framework), using vocabularies and ontologies; • publication of the dataset in LOD on Triple Store (RDF); • user-friendly ways of finding data, through a portal with navigational tools based on BIBFRAME or other data model (FRBR/LRM …).
The LOD Platform • Components of the technological architecture: • AUTHIFY, RESTFul module that provides bibliographic and authority search services and full text of external datasets, mainly related to Authority file (VIAF, Library of Congress Name Authority file,…) but also extendable to other types of datasets. • CLUSTER KNOWLEDGE BASE, on PostgreSQL database, is the result of the data identification, enrichment, and clusterization processes. • LODIFY, RESTFul module that automates the entire process of data conversion in RDF format; • BLAZEGRAPH and/or STARDOG, the Triple Store (open source and not) for storing RDF files; • PORTAL SKIN, the instance of data publication portal.
From Marc to BIBFRAME: • the conversionprocess
The startingpoint: the LOC Marc-to-BF mapping The conversionfrom Marc 21 to BIBFRAME starts from the Library of Congress mappingrules. Thisofficial source isenlarged/modified to manage some special requirements of differentprojects. Here, asexample, the MARC tag 300 – Physicaldescription– processing:
The startingpoint: the LOC Marc-to-BF mapping And here, asexample, the MARC tag100 – Main Entry-Personal Name – processing:
Lodify - Conversion templates • Lodifyconvertseachincoming record by means of Conversion templates. • Eachtemplateassociates: • a MARC record belonging to the incoming data-stream • with a set of (conversion) rulesassociated with BIBFRAME vocabulary • (here, tworules for tag 300a and tag 300c) <http://share-vde.org/sharevde/rdfBibframe/Instance/ 27293> <bf:dimensions> "15 cm." 001 27283 300 $a108 p. ;$c15 cm. R300c R300a <http://share-vde.org/sharevde/rdfBibframe/Instance/ 27293> <bf:extent> <http://share-vde.org/sharevde/rdfBibframe2/Extent/ 02e3f96a> <http://share-vde.org/sharevde/rdfBibframe2/Extent/ 02e3f96a> <rdfs:label> "108 p. ;"
Lodify - Conversion templates 100 10$aCastro, Juan Antonio.$1http://share-vde.org/sharevde/rdfBibframe/Agent/1801277$0http://id.loc.gov/authorities/names/n88638722$1http://www.isni.org/isni/0000000059264386$1http://www.wikidata.org/entity/Q21480742$1http://viaf.org/viaf/63125720/ (here, the rulefor tag100a subfields 0/1) R100a01 <http://share-vde.org/sharevde/rdfBibframe/Agent/1801277> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <bf:Person> . <http://share-vde.org/sharevde/rdfBibframe/Agent/1801277> <rdfs:label> "Castro, Juan Antonio." . <http://share-vde.org/sharevde/rdfBibframe/Agent/1801277> <bflc:name00MatchKey> "Castro, Juan Antonio." . <http://share-vde.org/sharevde/rdfBibframe/Agent/1801277> <bflc:name00MarcKey> "10010$aCastro, Juan Antonio. $1http://share-vde.org/sharevde/rdfBibframe/Agent/1801277 $0http://id.loc.gov/authorities/names/n88638722 $1http://www.isni.org/isni/0000000059264386 $1http://www.wikidata.org/entity/Q21480742 $1http://viaf.org/viaf/63125720/" . <http://share-vde.org/sharevde/rdfBibframe/Agent/1801277> <bflc:primaryContributorName00MatchKey> "Castro, Juan Antonio." .
Conclusions The SHARE Catalogue project is one of the main results of a project of cooperation and sharing of experiences and resources between Universities in Campania, Basilicata and Salento. This is an initiative that aims to facilitate the users experience in libraries spread over a vast geographical territory: rendering their catalogues, with their wealth of resources and their specificity, navigable and usable in a few steps according to data organisation models (such as BIBFRAME) that arise from an observation of the user’s information and search needs. In a simple and intuitive mode, it allows researches to find their way in a vast world of information to easily meet their needs. The project also aims to create a working group, formed by cataloguers and other experts willing to share their experience and expertise to improve the information on offer to users.
The SHARE Family The SHARE Catalogue project opened the way for a larger community of similar projects, identified under the SHARE family: the projects included in the SHARE family are promoted by libraries to establish procedures for identifying and reconciling entities, converting data to Linked Data, and creating a virtual discovery environment based on the three-tiered structure of the BIBFRAME data model. SHARE-VDE (Virtual Discovery Environment), a collaborative effort based on the needs of the different participating libraries, result of the collaboration between Casalini Libri (bibliographic agency and bibliographic data and authority records as a member of the Program for Cooperative Cataloging) and @Cult ( a supplier of Integrated Library System (ILS) and discovery tools), with the initial input of 16 North American university libraries.
The SHARE Family SHARE-Art, prototype for the Max Planck Institut art history libraries such as the ZentralinstitutfürKunstgeschichte in Munich, the KunsthistorischesInstitut in Florence, the Bibliotheca Hertziana in Rome and the CenterAllemandd'Histoire de L'Art in Paris (Kubikat group); one of the peculiarities of this project is to provide for the integration of the photo library and to build a bridge to give the user the possibility to move from the library to the museum collections. SHARE-Music, experimentation in the field of music with the participation of the BayerischeStaatsbibliothek in Munich, the Library of Congress and Stanford University; this project deals with very specific aspects such as, by way of example, the definition and management at the various levels of the Work (Work) and the application of the PMO ontology (Performed Music Ontology).