570 likes | 857 Views
Semantic Web and Digital Library Management. Ludovic Deravet Software Architect @ I.R.I.S. S&E. using Fedora-Commons. Semantic Web and Digital Library Management. PART 1: INTRODUCTION. Evolution of the Web. WEBOS. volume of data. SEMANTIC WEB. WEB. SPARQL. RDF.
E N D
Semantic Web and Digital Library Management Ludovic Deravet Software Architect @ I.R.I.S.S&E using Fedora-Commons
Semantic Web and Digital Library Management PART 1: INTRODUCTION
Evolution of the Web WEBOS volume of data SEMANTIC WEB WEB SPARQL RDF evolution of web technologies Flash OWL Distributed Search XML RSS DESKTOP Java Intelligent personal agents Semantic databases HTTP HTML Semantic Search MacOS SQL SGML SaaS Social Networking Websites Windows Wikis Lightweight collaboration Weblogs Email Groupware FTP File Servers Keyword Search File Systems Databases 1990-2000 2000-2010 1980-1990 2010-2020
Managing and Searching Information Search Result(s)
Semantic Web Foundations http://www.irislink.com/#company SELECT ?subject ?label WHERE { ?subject rdfs:subClassOf ?object . OPTIONAL { ?subject rdfs:label ?label } I.R.I.S. D.M. experts RDF triples RDFS OWL <rdf:RDF … xmlns:contact=http://.../contact#> <contact:Personrdf:about=http://.../contact#me> <contact:fullName>…</contact:fullName> <contact:mailBoxrdf:resource=mailto:xxx@yyy/> </contact:Person </rdf:RDF>
Fedora-Commons Features Fedora Repository Modules Dissemination Validation Security Resource Index Storage Management Registry CMA RDF Files RDBMS
How can we help you? I.R.I.S. S&E – International Organisations
Semantic Web and Digital Library Management PART 2: ADVANCED
Semantic Web and Digital Library Management DIGITAL LIBRARY
What is Digital Library Management? A solution to meet the needs for: • Bulk load of digital assets • Cataloguing • Editing • Storing • Searching
Evolution of the Web volume of data WebOS Semantic Web SPARQL RDF evolution of web technologies WWW Flash OWL Distributed Search XML RSS Java Intelligent personal agents Semantic databases HTTP HTML Semantic Search MacOS SQL SGML SaaS Social Networking Desktop Websites Windows Wikis Lightweight collaboration Weblogs Email Groupware FTP File Servers Keyword Search File Systems Databases 1990-2000 WEB 1.0 2000-2010 WEB 2.0 1980-1990 PC ERA 2010-2020 WEB 3.0
Problem – Searching and Managing Information • Synonyms • have a different spelling but have the same (or quite) meaning • Homonyms • sound alike but have different meaning • most of the time, they have a different spelling • Languages • might require lot of maintenance • not always the same level of quality in each language • Parametric Search • It’s difficult to find things, especially something specific • Too few = too many search results • Too much = no search result
Problem – Searching and Managing Information • Time spent • users spend too much time searching for what they are looking for • Data reusability • Limited ability to reuse data • Managing the information is complex • Within the same company, each department often manages its own information • Each department might have its own way of solving the problem • Try to use technologies to solve the original problem (e.g. MDM) • High volume of information requires human management of the information • Using hierarchical solutions by classifying information • Using horizontal solutions with tags
Semantic Web and Digital Library Management SEMANTIC WEB
What is Semantic Web? The idea behind is “quite” simple: • electronic information will become unambiguous • data will become findable • data will be reusable • data will be interoperable • systems will be flexible • real time information
Foundations of Semantic Web • URIs for everything • Triples: <subject> <predicate> <object> • Models and technologies (e.g. RDF) • Data exchange formats (e.g. RDF/XML, N-Triples) • Notations (e.g. RDFS, OWL) • SPARQL
Foundations of Semantic Web(example) Albert is the father of Philippe SUBJECT PREDICATE OBJECT http://www.belgium.be/person albert/profile.html http://www.belgium.be/person philippe/profile.html http://www.belgium.be/rdf/ relationship#fatherof in RDF notation <rdf:RDFxmlns:be=http://www.belgium.be/rdf/relationship#>
Foundations of Semantic Web(example) be:Kingrdfs:subClassOfbe:Person be:Princerdfs:subClassOfbe:Person RDFS dc:subjectrdf:typerdf:Property PREFIX be: <http://www.belgium.be/ontology> SELECT ?firstname ?lastname WHERE { ?person a be:Person ?person be:firstname ?person be:lastname } SPARQL
Semantic Web and Digital Library Management FEDORA-COMMONS OVERVIEW
Semantic Web and Digital Library Management FEDORA-COMMONS IN DETAILS
Fedora-Commons Features Fedora Repository Modules Dissemination Validation Security Resource Index Storage Management Registry CMA RDF Files RDBMS
Semantic Web and Digital Library Management Fedora Repository Modules Dissemination Validation Security RI Store Management Registry CMA
CMA – Content Model Architecture Content Model Service Definition fedora-model: hasService fedora-model: hasModel fedora-model: isContractorOf fedora-model: isDeploymentOf Service Deployment Data (Digital Object)
Digital Objects Relationships - Example ns:hasPhotoLocation Windows Operating System Address dc:title Rights ns:isRunningOn ns:hasAddress ns:hasLicense ns:hasText Document Server IRIS Corporate ns:hasLogo ns:hasName ns:supportFormats ns:hasCompression Compression Documents iHQC dc:title I.R.I.S. Group ns:hasLogo
Semantic Web and Digital Library Management Fedora Repository Modules Dissemination Validation Security RI Store Management Registry CMA
Dissemination (Example) Title: The ‘Great Migrations’ Owner: NGC Date: 06/11/2010 1) http://website/pid/pdf THUMBNAIL 2) Calls service with PID and format WSDL Transformation Service VIDEO XML 3) Returns PDF representation ( dissemination) of the requested resource High Speed Videos Streaming platform Archive notice
Semantic Web and Digital Library Management Fedora Repository Modules Dissemination Validation Security RI Store Management Registry CMA
Stores Fedora Repository Modules Storage Default Store File-System Amazon Scalable (no limitation of files) Reliable (SLA 99.99%) No file-system limitation Cost Management (pay for what you use) iRODS is handling the digital objects Fedora-Commons is handling the metadata / management Distributed Management System Stores can be located at different places (geographically) StorageTek 5800 System Distributed Management Storage Manages datasets stored in a wide range of data stores (file-system, network, databases…) Large datasets
Semantic Web and Digital Library Management Fedora Repository Modules Dissemination Validation Security RI Store Management Registry CMA
Resource Index Fedora Repository Modules RI Triples Store Mulgara
Resource Index (RI) - Example Stephen Hawking’s Universe Library dc:title Video L1 dc:language English V1 ns:isMemberOf dc:description Explores the greatest mysteries of the cosmos. Category dc:author C1 Stephen Hawking ns:isCategoryOf ns:isCollectionOf The Story of Everything ns:isCollectionOf ns:isCollectionOf dc:title Episode E3 Episode Episode ns:type E1 E2 Science ns:format dc:title dc:title Blue-Ray ns:format Time Travel Blue-Ray ns:format Aliens Blue-Ray
Resource Index (cont’d) ITQL Queries (http://docs.mulgara.org/itqlcommands/index.html)
Semantic Web and Digital Library Management Fedora Repository Modules Dissemination Validation Security RI Store Management Registry CMA
Validation • Applied when managing digital objects: • foxml 1.0 • foxml 1.1 • mets 1.0 • mets 1.1 • atom • Use schematron • rule-based validation language • structural language expressed in XML <sch:pattern name="Preliminary Object Checks" id="preliminary"> <sch:rule context="foxml:datastream[@ID='AUDIT']"> <sch:assert test="count(foxml:datastreamVersion) = 1">The AUDIT Datastream can only have ONE version since it is a non-versionabledatastream. (foxml: datastreamVersion)</sch:assert> </sch:rule> </sch:pattern>
Security • Legacy Authentication and Authorization • Authorization: XACML (from Sun) • Authentication: using server filters • FeSL • will replace XACML in a future release of Fedora-Commons • based on JAAS (Java Authentication and Authorization Service
Management • Primary APIs • REST API (HTTP) • API-A and API-M (SOAP) • Secondary APIs • Resource Index with iTQL and SPARQL (HTTP) • OAI-PMH for metadata harvesting across repositories (HTTP) • Third-Party APIs • MediaShelf with a Java client APIs
Semantic Web and Digital Library Management WHO’S GONE FEDORA-COMMONS and USER COMMUNITY