1 / 25

Research Information Linked Open Data Store

Research Information Linked Open Data Store. euroCRIS members meeting, Bonn, may 2013. O verview. Needs & Drivers Information and data sources Structured Unstructerd Architecture Planned Realised Tools. Project. Partners Knowledge Management unit, EWI IBM Belgium Goals

duff
Download Presentation

Research Information Linked Open Data Store

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research Information Linked Open Data Store euroCRIS members meeting, Bonn, may 2013

  2. Overview • Needs & Drivers • Information and data sources • Structured • Unstructerd • Architecture • Planned • Realised • Tools

  3. Project • Partners • Knowledge Management unit, EWI • IBM Belgium • Goals • Merge all sources into one open environment. • Apply entity resolution technique to remove data silo’s • Crawling and content analysis of full text elements • Build and test the proposed Pilot Architecture • Information integration form structured and unstructured data in one container • Build a number of visualisations of the information • Develop a roadmap towards the Operational Architecture • Timing: • 4 months starting from January 20113 • Cost • 124k euro

  4. Needs & drivers Better information: correct, actual , complete Open FRIS data for services and application devellopment Flemish government Open Data policy Maximum reuse of components Increase strategic intelligence Maximum reuse of data Policy monitoring: efficient & effective Connect data silo’s More information services Reduce system costs

  5. FRIS structured information

  6. FRIS Unstructured Data

  7. Information and Data sources • Structured Data • FRIS research portal database • Format: CERIF2006 database • Coverage: All universities 1 university college • 4 university OAR’s • Format: MODS records • Coverage: X publication records, X full tekst resources • VABB-SSH: publication monitoring data set on Social Sciences and Humanities • Format: MODS records • Coverage: All universities • Semantics and information model • Business Semantics Glossary • FRIS model: CERIF2006 • Semantics: Entitiy Classifications

  8. Information and Data sources • Unstructured Data • All textual information form the structured data • Project Abstracts • Publication Abstracts • Organisation Activity descriptions • Full text of Publication • Websites • Project • Researcher • Organisation

  9. Links andLocators • Access to unstructured data • Textual elements in CERIF model • Project Abstracts • Publication Abstracts • Organisation Activity descriptions • Websites • URI fields in CERIF entities • Links to fulltext • Resource links in MODS records

  10. Scope

  11. Somenumbers • CERIF records: • Person:22.006 (FRIS) +1.454.208 (OAI without resolution) • Project:24.634 (FRIS) • Organisation:1.398 (OAI) + 2.022 (FRIS) • Publications: 3.596(FRIS) • MODS records • OAR’s:598.035 (OAI) + VABB database • Publication Full text :45.294 (OAI)

  12. PlannedArchitecture Identifiers & EntityResolution Content Analysis Concept Extraction Visualisation Triple Store Structured Data input Operational Store Semantic control

  13. RELOD Structured Data Architecture

  14. OAR Harvesting Architecture Crawler management OAI-PMH Crawler UGent MODS to CERIF conversion CERIF database D2Rtransformation UHasselt … XML VABB

  15. RELOD Architecture

  16. Architectuur – Tools & Standards BSG SBVR Jena HTTP D2R TDB REST Java Java SPARQL SKOS OWL RDFS WEB 2.0 APACHE FUSEKI Oracle TOMCAT RDF CERIF SILK R2R SIEVE ICA ICC HARVESTER OAI-PMH LDIF UIMA MODS

  17. Somenumbers • Entities • Projecten: 24.634 (FRIS) • Personen: 22.006 (FRIS) +1.454.208 (OAI zonderresolutie!)) • Publicaties: 598.035 (OAI) + 3.596 (FRIS) • With full text: 45.294 (OAI) • OrgUnit: 1.398 (OAI) + 2.022 (FRIS) • Recognised author affiliation from full text: 55662 • Triple Store • Triples FRIS+OAI : 57M • Triples text mining (author recognition + lemmas) : 144M • --> Still without inference (no inference deduce triples)

  18. Analyse - Visualisatie

  19. Visualisations • Two test visualisations build sofar: • Word cloud for person • http://ewisclod3.vlaanderen.be/words/ • Persons related to Concepts • http://ewisclod3.vlaanderen.be/persons/ • New visualisations will be build on well defined use cases • Tuning the Content analytics to the case • Supervised learning for specific domains • Give an contextual overview of research from the last 10 years on social security issues in Belgium • Annual report on research in the domain of renewable energy

  20. Entityresolution • A few tools tested • Silk Link Discovery Framework • used to map authors from the OAR harvest onto Persons form the CERIF sources. • Experimented with • manual construction of matching ruls via de Silk workbench • Active learning combined with the Silk generic algoritms • Several metrics on the tekst dimensions: Levenstein, tf-idf, Jaro, Jacard in combination with numerical and temporal dimensions • Results still have to be validated in detail. • Tests with OKKAM are planned

  21. Architecture Roadmap Elements (optional) Replace D2R with standard: R2RML Full-CERIF automatic D2R template generation Support incremental CERIF/RDF loading Integration of Data Governance Center via he API Complete modelling of CERIF and Semantics in Data Governance Center Full-CERIF automatic ontology template generation manueel geautomatiseerd

  22. D2R Views • FRIS: http://ewisclod3.vlaanderen.be/d2rq/fris/ • OAI-PMH: http://ewisclod3.vlaanderen.be/d2rq/oai/ • Text Mining: http://ewisclod3.vlaanderen.be/d2rq/tm/ • SPARQL • Test pagina: http://ewisclod3.vlaanderen.be/ewilod/html/sparql-test.html • Endpoint (enkel query): http://ewisclod3.vlaanderen.be/ewilod/sparql • RESTful API (GET) • Resource basis URL: http://ewisclod3.vlaanderen.be/ewilod/lod/0.1/resource/ • Ontologie basis URL: http://ewisclod3.vlaanderen.be/ewilod/lod/0.1/ontology • TriplestoregrafeURIs • FRIS: http://ewisclod3.vlaanderen.be/ewilod/lod/0.1/graphs#fris • OAI-PMH: http://ewisclod3.vlaanderen.be/ewilod/lod/0.1/graphs#oai • TextMining: http://ewisclod3.vlaanderen.be/ewilod/lod/0.1/graphs#tm • Mappings: http://ewisclod3.vlaanderen.be/ewilod/lod/0.1/graphs#ld • LDIF • Status monitor: http://ewisclod3.vlaanderen.be/ldif/status/ • Silk • Workbench: http://localhost:8080(via SSH tunnel) • Visualisaties • Index pagina: http://ewisclod3.vlaanderen.be/ewilod/html/vis/index.html • Hierbij de visualisaties: http://ewisclod3.vlaanderen.be/persons/http://ewisclod3.vlaanderen.be/words/

  23. Hierbij de visualisaties: http://ewisclod3.vlaanderen.be/persons/http://ewisclod3.vlaanderen.be/words/

More Related