550 likes | 563 Views
This tutorial provides an introduction to CERIF, a common format for research information. It covers the conceptual model, metadata-centric approach, and the semantic layer of CERIF, as well as its evolution and ongoing activities.
E N D
CERIF 2008 Tutorial Brigitte Jörg, M.A. (Information Science) Language Technology Lab, German Research Center for Artificial Intelligence (DFKI) Saarbrücken, Germany
Outline • Introduction of Speaker • What is CERIF? • Grounding Explanations • Model • Metadata (Data)-centric • Research Information • The Conceptual CERIF Model • Entities • Relationships • Structure • The CERIF Semantic Layer in some Detail • The CERIF Evolution, Aim and Ongoing Activities • The CERIF 2008 Release
Introduction of Speaker Brigitte Jörg M.A. Information Science Information Systems, Business Administration • Project Manager, Researcher Language Technology Lab, DFKI GmbH Saarbrücken • CERIF TG Leader, Board Member euroCRIS • Contact: brigitte.joerg @ dfki.de http://www.dfki.de/~brigitte/
Funding Programme Organisation Organisation Person Person Project Project Service Skills Publication Equipment CV Patent Classification Classification Product ( ( ) ) Semantics Semantics Event What is CERIF ? Common European Research Information Format
What is CERIF ? Common European Research Information Format • A Concept about Research Entities and their RelationshipsSpecification • A Description of Research Entities and their Relationships Model • A Formalization of Research Entities and their Relationships Database Scripts SQL Script ----------------------- CREATE Table Person CREATE Table Project CREATE Table OrgUnit
What is CERIF ? Common European Research Information Format • data model (data-centric) • allows for a (metadata) representation of • research entities • their activities / interconnections (research) • their output (results) • offers high flexibility with (semantic) relationships • enables quality maintenance, archiving, access and interchange of research information • supports knowledge transfer to decision makers, for research evaluation, research managers, strategists, researchers, editors, the general public
informs is part of A C B D depends on X Z waits for F G What is a model ? • … is a simplified view to describe a particular area of interest • … allows for a better communication between parties (mutual understanding) • … supports (re-)design decisions • … supports workflow identification • … supports documentation • … can be exchanged, re-used, iterated, extended
What is Metadata ? „Metadata is structured data which describes the characteristics of a resource.” An Introduction to Metadata, by Chris Taylor, University of Queensland “Metadata is sometimes defined literally as 'data about data,' but the term is normally understood to mean structured data about resources that can be used to help support a wide range of operations. These might include, for example, resource description and discovery, the management of information resources and their long-term preservation.”Metadata in a Nutshell, by Michael Day, UKOLN Support a Wide Range of Operations
Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata What is Metadata ? Book: Title: The Hitchhiker‘s Guide to the Galaxy Date of Publication: 1979 Radio Series: Title: The Hitchhiker‘s Guide to the Galaxy Description: is a science fiction comedy series created by Douglas Adams. Originally a radio comedy broadcast on BBC Radio 4 in 1978, […] Source: Wikipedia Date of Query: May 30, 2008 Series of five Books: Title: The Hitchhiker‘s Guide to the Galaxy. Between: 1979 - 1982 • Structure: • Type of Resource • Title • Description • Source • Date • Author, Creator, … TV Series: Title: The Hitchhiker‘s Guide to the Galaxy Screened: 1981 Data about Data Game Cover Image: The Hitchhiker‘s Guide to the Galaxy Source:http://egotron.com/ Retrieved: May 30, 2008 Computer Game: Title: The Hitchhiker‘s Guide to the Galaxy Released: 1984 Links: http://www.bbc.co.uk/cult/hitchhikers/ HTML-Title: Cult – The Hitchhiker‘s Guide to the Galaxy http://en.wikipedia.org/wiki/The_Hitchhiker's_Guide_to_the_Galaxy HTML-Title:The Hitchhiker's Guide to the Galaxy Comic Book Adaptions: Title: The Hitchhiker‘s Guide to the Galaxy Between: 1993 – 1996
What is Data-centric ? Publication URI: Type: Title: PartOf: PublDate: Organisation URI: Name: Abbreviation: Publications: Academic Staff: Journal Publications 2007 Institute A = 4 Institute B = 10 Institute C = 9 Article Requests 2007 Journal X = 4 Journal Y = 0 Journal Z = 15 Data Metadata PhD Students 2008 Computer Science = 200 Physics = 50 Social Sciences = 9 Journal Subscriptions Journal X = 1990 - 2000 Journal Y = 2005 - 2010 Journal Z = 2001 - 2010 Organisation URI: Name: hasAccess: EndOfAccess ContactPerson: Ends in 2010 Journals: Y, Z First Author / No of Papers Person H = 10/35 Person I = 4/12 Person J = 1/10 Citations in 2007 Paper M (publish 2007) = 20 Paper N (publish 2004) = 100 Paper O (publish 2001) = 0 CitationTypes Type: Description:
What is Data-centric ? • Data / Metadata in the center • Data Maintenance, Curation, Preservation and Quality a major Interest • Enabling added-value Services based on qualitative Data • Enabling requested views for various stakeholders based on qualitative Data
What is Research Information ? Data/Metadata or Information about: • Scientists • Project Managers • Ongoing and Completed Projects • Research Departments • Funding Organisations and Programmes • Research Results • Publications • Equipment • … • their timely Relationships (Semantics)
Funding Programme Organisation Organisation Person Person Project Project Service Skills Publication Equipment CV Patent Classification Classification Product ( ( ) ) Semantics Semantics Event Common European Research Information Format
Common European Research Information Format • A model to manage Research Information • Research Entities • Project, Person, Organisation, Publication • Funding Programme, Service, Equipment, • Patent, Product, … • Activities / Interconnections in their Context • Relationships • Semantics / Roles / Types -> for Exchange -> for Interoperability -> for Implementation of CRISs (Current Research Information Systems)
CERIF Structure • Core Entities • 2nd Level Entities • Language-related Entities • Link Entities • Classification Entities (Semantic Layer)
Publication ID URI Title Subtitle Abstract Bibl. Note PublicationDate TotalPages StartPage EndPage Keywords Person ID URI Sex FirstNames OtherNames FamilyNames NameVariants ResearchInterest Keywords Project ID URI Acronym StartDate EndDate Title Abstract Keywords Organisation ID URI Acronym Name HeadCount CurrencyCode Turnover ResearchActivity Keywords Core Entities
Event ID URI Name FeeOrFree StartDate EndDate CityTown CountryCode Description Keywords FundingProgramme ID URI Name CurrencyCode Budget StartDate EndDate Description Keywords ResultPatent ID URI PatentNumber Title CountryCode RegistrationDate ApprovalDate Description Keywords 2nd Level Entities Facility ID URI Name Description Keywords Service ID URI Name Description Keywords
Language-related Entities Publication Title [language] Abstract [languange] Keywords [language] Organisation Name [language] ResearchActivity [languange] Keywords [language] Patent Name [language] Description [languange] Keywords [language] Product Name [language] Description [languange] Keywords [language] Service Name [language] Description [languange] Keywords [language] Facility Name [language] Description [languange] Keywords [language] Project Title [language] Abstract [languange] Keywords [language] Person ResearchInterest [language] Keywords [language]
ResultPublication ResultPublication ResultPublication ResultPublication OrgUnit_ResultPublication Person_ResultPublication Person_ResultPublication Project_ResultPublication Project_ResultPublication Person OrganisationUnit OrgUnit OrgUnit OrgUnit OrgUnit OrgUnit Person Person Person Person_OrganisationUnit Project_Person Project_OrganisationUnit Project_Person Project_OrgUnit Project Project Project Project Link Entities
Link Entities Person_Publication persID publID Classification ClassificationScheme StartDate; EndDate Person Publication Semantics Person_Organisation persID orgID Classification ClassificationScheme StartDate; EndDate Person Organisation Semantics Project_Person projID perslID Classification ClassificationScheme StartDate; EndDate Project Person Semantics
Link Entities Organisation_Publication orgID publID Classification ClassificationScheme StartDate; EndDate Person_Publication persID publID Classification ClassificationScheme StartDate; EndDate Project_Publication projID publID Classification ClassificationScheme StartDate; EndDate Project_Publication persID publID Classification ClassificationScheme StartDate; EndDate Project_Person projID perslID Classification ClassificationScheme StartDate; EndDate Person_Organisation persID orgID Classification ClassificationScheme StartDate; EndDate Project_Organisation projID orgID Classification ClassificationScheme StartDate; EndDate
Classification Entities (Semantic Layer) Formal Semantics / Values for Link Entities
Classification ClassID ClassSchemeID Term [language] Description [language] StartDate, EndDate URI ClassificationScheme ClassSchemeID Description [language] URI Classification_Classification ClassID1 (Term1) ClassID2 (Term2) ClassSchemeID1 (Schema1) ClassSchemeID2 (Schema1) ClassId (Role) ClassSchemeID (RoleSchema) StartDate, EndDate Classification Entities (Semantic Layer: Abstract) ClassScheme_ClassScheme ClassSchemeID1 ClassSchemeID2 ClassID (Role) ClassSchemeID (RoleSchema) StartDate, EndDate
Classification Entities (Semantic Layer: Example) Subject Headings ClassificationScheme ClassSchemeID LT (Language Technology) Description [EN] The Language Technology Schema is an ontology … URI http://www.lt-world.org/ Classification ClassID AE (Answer Extraction) ClassSchemeID LT (Language Technology) Term [EN] Answer Extraction Description [EN] AE is the method … StartDate, EndDate 2008-10-08, open URI http://www.lt-world.org/Technologies/IE/AE Classification_Classification ClassID1 AE (Answer Extraction) ClassID2 IE (Information Extraction) ClassSchemeID1 LT (Language Technology) ClassSchemeID2 LT (Language Technology) ClassId isA ClassSchemeID Taxonomic Relationships StartDate, EndDate 2008-10-08, open ClassScheme_ClassScheme ClassSchemeID1 LT (Language Technology) ClassSchemeID2 ONT (Ontology) ClassID isA ClassSchemeID Taxonomic Relationships StartDate, EndDate 2008-10-08,open
Classification Entities (Semantic Layer: Example) Role Schemes ClassificationScheme ClassSchemeID PPR-Roles Description [EN] The PPR-Roles Scheme collects the Person-Project Roles in the LT World System URI http://www.lt-world.org/internal/PPR-Roles Classification ClassID PM (is manager of) ClassSchemeID PPR (Person-Project-Roles) Term [EN] is manager of Description [EN] A project manager is respon- sible for the successful … StartDate, EndDate 2008-10-08, open URI i.e.:PPR=PM Classification_Classification ClassID1 PM (is manager of) ClassID2 pMM (project management) ClassSchemeID1 PPR-Roles (Org1-Roles) ClassSchemeID2 pMM-Roles (Org2-Roles) ClassId isSimilar ClassSchemeID SimilarityRelationships StartDate, EndDate 2008-10-08, open ClassScheme_ClassScheme ClassSchemeID1 PPR-Roles (Org1-Roles) ClassSchemeID2 pMM-Roles (Org2-Roles) ClassID isMappedTo ClassSchemeID Project MM Mappings StartDate, EndDate 2008-10-08,open
Classification Entities (Semantic Layer: Example) Type Schemes ClassificationScheme ClassSchemeID cfPT Description [EN] The CERIF Scheme for thePublication Types has been developped … URI http://www.eurocris.org/CERIF/cfPT Classification ClassID cfART (Article) ClassSchemeID cfPT (Publication Types) Term [EN] Description [EN] An article is usually published in … StartDate, EndDate 2008-10-08, open URI http://www.eurocris.org/CERIF/cfPT=cfART Classification_Classification ClassID1 cfART (Article) ClassID2 btART (Article) ClassSchemeID1 cfPT (Publication Types) ClassSchemeID2 btPT(Publication Types) ClassId isEqualTo ClassSchemeID EquationRelationships StartDate, EndDate 2008-10-08, open ClassScheme_ClassScheme ClassSchemeID1 cfPT (Publication Types) ClassSchemeID2 btPT(Publication Types) ClassID isMappingOf ClassSchemeID CERIF-BibTex MappingStartDate, EndDate 2008-10-08,open
Publication Publication_Classification Publication_Classification Publication_Classification Publication_Classification Publication_Classification Publication_Classification Publication_Classification Publication_Classification Publication_Classification Publication_Classification AccessType=openAccess Category=commissioned ReviewType=peer-reviewed ImpactFactorType=diametric PublicationType=Journal Classification Classification Person Person Person Person Person Classification Entities (Semantic Layer: Schemes)
Book Chapter Abstract Book Chapter Inbook Anthology Monograph Reference Book Manual Commentary Book Chapter Review Textbook Book Annotation Book Review Publication Types Encyclopedia News Clipping Journal Article Otherbook Report Journal Conference Proceedings Letter PhD Thesis Journal Article Abstract Short Communication Conference Proceedings Article Letter to Editor Doctoral Thesis Journal Article Review Poster Presentation Classification Entities (Semantic Layer: Types)
is author of is author (numbered) of is publisher of is author (percentage) of Person_Publication Scheme is subject of is editor (numbered) of is editor of is translator of is reviewer of Classification Entities (Semantic Layer: Roles)
number of citations number of incoming citations number of self citations claims IPR of number of authors received Best Paper Award Publication_Metrics Roles number of external institutes ISI Impact Factor number of downloads is of publication type number of access number of requests area/type of research Classification Entities (Semantic Layer: Roles)
Classification Entities (Semantic Layer added Value) • Allows to capture any Schema or Structure • Flat Lists • Taxonomies • Ontologies • Open / Extensible in all directions • New Schemas • New Concepts / Terms • New Relationships • Enables to manage • Roles / Types Semantics • Subject Headings • Archiving (Time component) • Allows for simple Mappings between Schemas • Allows for a efficient (independent) Maintenance
What for ? Publication URI: Type: Title: PartOf: PublDate: Organisation URI: Name: Abbreviation: Publications: Academic Staff: Journal Publications 2007 Institute A = 4 Institute B = 10 Institute C = 9 Article Requests 2007 Journal X = 4 Journal Y = 0 Journal Z = 15 Most Requested Journal: Z Deduction Inferencing Reasoning PhD Students 2008 Computer Science = 200 Physics = 50 Social Sciences = 9 Journal Subscriptions Journal X = 1990 - 2000 Journal Y = 2005 - 2010 Journal Z = 2001 - 2010 Organisation URI: Name: hasAccess: EndOfAccess ContactPerson: Ends in 2010 Journals: Y, Z First Author / No of Papers Person H = 10/35 Person I = 4/12 Person J = 1/10 2007 -> 2008 Computer Science =-20 Physics = -5 Social Science = +2 Citations in 2007 Paper M (publish 2007) = 20 Paper N (publish 2004) = 100 Paper O (publish 2001) = 0 CitationTypes Type: Description:
What for ? http://www.ist-world.org/ Projects (Red Dots)Linked with Full Record in Repository Thematic Areas (Blue Clouds): SEMANTIC HEALTH LEGALCHANGING ROADMAP SOFTWARE Aim: investigate the thematic range of SSA projects in FP6
What for ? http://www.ist-world.org/ Themes Goals Aim: investigate the thematic range of SSA projects in FP6
What for ? http://www.ist-world.org/ Project Number of joint partners Aim: investigate the collaboration of SSA partners in FP6
What for ? What questions do we expect to answer with CERIF? • How many articles has author X published in 2007 as a first author? • How often have articles by author X been cited? • Did author X publish with institutionally external authors? • In how many FP7 projects does organisation Z participate? • How many publications have resulted from project Y? • How many people have been employed in the course of FP6 projects from the 1st call in the NMS? • How many PhD students have participated in FP6 projects? • How many women have been involved in FP6 projects? • How often have articles from journal A been requested in 2007? • How many articles have been published in the field of B? • …
The CERIF Evolution CERIF 2006 / 2008 Model Similar Ideas UN/UNESCO OECD CODATA CORE Link Semantics Language 2ndLevel EU Working Group on Research Databases Workshop CERIF 2000 Model Roles EXPERTISE OrgUnit PERSON CERIF 91 PROJECT RESULTS EQUIPMENT PROJECT CLASSIFICATION Acronym: ERGO Participant: Keith Jeffery, Anne Asser son, many more Organisations: Rutherford Appleton, Uni- versity of Bergen, … • - Data Model (RDBMS, OO, IR) • Model Normalization • - Robust Structure • - Extensible Structure • - Consistent Structure • - Semantic Layer • XML Exchange Specification • Connectivity to Repositories (Elaboration on Publication) • - Data Model (RDBMS, OO, IR) • - Multilinguality • Controlled Vocabulary • Roles / Types • User-driven • EC Recommendation to Member States • - Networking of DBs • Exchange of Records • Recommendation to Member States 1987 1991 2000 2006 2008
CERIF 91 • published in a first release • recommended to Member States • to harmonise databases on research projects • ease exchange of comparable information • guidelines for building research databases • only dealt with research project records • demonstrated in the ERGO pilot project • access to more than 80.000 project records • from more than 20 national information services • demonstrated the feasability of exchange • identified the need for more detailed guidelines • confirmed the need to revise CERIF and extend it to other types of research information, not only projects • revision activities started in 1997 co-ordinated by the EC • led to CERIF 2000
CERIF 2000 • a full CRIS data model with flexibility to accomodate many database structures • a base framework for data exchange • multilingual subject indexing (Ortelius Thesaurus) • recommendations for controlled attribute values • reflection on user groups and requirements • types of research information • metadata environment as a uniform summary view • extensions to • Organisations • Persons • Results: Products, Patent, Publication • Expertise • Equipment and Facilities
What is going on ? JISC Report from April 2008 “Metadata for digital libraries: state of the art and future directions” by Richard Gartner http://www.jisc.ac.uk/media/documents/techwatch/tsw_0801pdf.pdf • Many available Schemas (DC, METS, MODS, …) • Each schema was singularly developed and not designed as an overal architecture to cover integrated object entities • JISC recommends therefore to overcome the problem by best practise guidelines and pragmatic application • Issues of duplicate information (overlap in sections of metadata) need rules and are currently being addressed by the library community in good practise guidelines
What is going on ? JISC Report from April 2008 “Metadata for digital libraries: state of the art and future directions” by Richard Gartner http://www.jisc.ac.uk/media/documents/techwatch/tsw_0801pdf.pdf • Descriptive Metadata (intellectual contents) • Administrative Metadata (technical metadata [file formats], rights management, provenance [info on creation, subsequent treatment, responsibility, …]) • Structural Metadata (internal structure of items: e.g.: page order, …) • METS • DIDL • …
What is going on ? JISC Report from April 2008 “Metadata for digital libraries: state of the art and future directions” by Richard Gartner http://www.jisc.ac.uk/media/documents/techwatch/tsw_0801pdf.pdf • XML is of great importance to embed and make use of namespaces • Combining Metadata standards, even a limited such as described above, will always be messier than utilising a single standard that combines their taxonomic powers and resolves any potential clashes or duplications between them. • Integration by itself would of course be of little consequence if the standards themselves failed to address themetadata needs of the digital library community. In this respect, the provenance of each standard is of some importance. All have been constructed by authoritative standard setters within their communities. • Most of the mentioned standards have proved their ability to meet the requirements of major and highly complex digital collections.
What is going on ? Source: http://maps.repository66.org/; Reported on: http://www.sparceurope.org/
Funding Programme Organisation Organisation Person Person Project Project Service Skills Publication Equipment CV Patent Classification Classification Product ( ( ) ) Semantics Semantics Event What CERIF aims for Source: http://maps.repository66.org/; Reported on: http://www.sparceurope.org/
Funding Programme Organisation Organisation Person Person Project Project Service Skills Publication Equipment CV Patent Classification Classification Product ( ( ) ) Semantics Semantics Event What CERIF aims for Enabling the ERA eInfrastructure Standardization / Integration / Interchange Added-Value Services Middle (Interoperability)-Layer for EU Research Information
Activities • UK: Research Councils specified to use CERIF as the format for IT processes and MM information • UK: STFC (Corporate Data Repository) • BE: Flanders – CERIF as Standard Interchange Format • DK: Danish Universities PURE -> CERIF • EUROPEESF: CERIF for IS under discussionCORDIS, EC R&D Service: Asked for CERIF presentation EuroHORCS: Recommendation for CERIF; ESF joined as a euroCRIS member
Activities • IST World SSA (project) • Videolectures.net (Teaching Videos) • BioDiversa ERANET (project) • IWETO (BE): Integrating Flemish Research Information • FRIDA (NO): Joint university CRIS • Fdok (NO): University of Bergen, results • METIS (NL): currently used by Dutch Universities • STFC (UK): Corporate Data Repository • HUNCRIS (HU): Access to R&D in Hungary • SICRIS (SI): Access to University Research in Slovenia • SRIS (UK): Scottish Research Information Systems, public research in Scotland • AURIS-MM (AT): Provides access to Austrian University Research extended with multimedia • ICERIS (IS): Access to Information on Icelandic Research Projects & R&D Results • CRIS-MER (EC): Research information on Migration and ethnic Relations (planned)