200 likes | 353 Views
Metadata Schema for CERIF-2000. Andrei Lopatenko Vienna University of Technology http://derpi.tuwien.ac.at/~andrei. What we have now. SGML DTD to describe CERIF data (old version of CERIF) SGML is used for data exchange between national institutions and ERGO
E N D
Metadata Schema for CERIF-2000 Andrei Lopatenko Vienna University of Technology http://derpi.tuwien.ac.at/~andrei
What we have now • SGML DTD to describe CERIF data (old version of CERIF) • SGML is used for data exchange between national institutions and ERGO • SGML DTD is only for old version of CERIF (projects) • Strictly defined structure and semantic of elements
What we need • Metadata format to describe the CERIF-2000 data (with new entities, attributes) • Due to diversity of data descriptions in different countries, institutions it should be possible to extend schema with expressing meaning of new elements
Possible solution • Semantic Web – • RDF (Resource Description Framework) to encode data, • DAML + OIL (DARPA Agent Markup Language + Ontology Inference Layer) to express semantic of classes and attributes
Advantages • The direct way to Knowledge Management solution • The possible way to solve problems of different vocabularies, classifications. Ready to work in heterogeneous distributed environment • Easy to implement contrasting to KIF/KQML, Description Logic solutions
Advantages • XML experience can be utilized for development SW solutions • XML compatibility makes solution close to industry solutions • Semantic richness of SW makes possible to developed advanced information retrieval over SW encoded data • Already developed tools can be applied
Disadvantages • XML experience is not enough. Developed should be taught to SW • Not so powerful as complete Description Logic solutions • Not so efficient on huge volumes of data as traditional database technologies (replication)
DAML + OIL • Allows to describe hierarchical relations between classes of data • Allows to specify classes (create vocabulary!) of data using slot restrictions Example: “Workshop” is “Event” “EU project” is a “Project”, which value of attribute “funding organization” is an object of class “European Funding Organization”
DAML + OIL • Distributed ontologies My (AURIS-MM) project is a subClassOf CERIF:Project. • Tools for ontology checking (Description Logics, CLOS based theory for DAML ) • Tools for ontology development • Tools for ontology visualization
DAML + OIL • Advanced information retrieval solutions • Implemented and tested • Projects: EU Projects (On-To-Knowledge, KA3:IAF ), DARPa project CAKE, WebScript, DAML Services, Knowledge Creation tools for DAML, ASCS, etc • See, www.cordis.lu, www.darpa.mil, www.daml.org, derpi.tuwien.ac.at/~andrei/DAML.htm
DAML + OIL • Developed the first version of ontology • http://derpi.tuwien.ac.at/~andrei/cerif-rdf-dc-mn.daml • Mapping (as a subclass relations and axioms) to other well-known schemas (DublinCore and MathNet) • Tested for simple information retrieval operations (but including semantic information)
DAML + OIL example of schema <daml:Class rdf:ID="http://derpi.tuwien.ac.at/~andrei/cerif-rdf-dc-mn.daml#CERIF.Workshop"> <rdfs:label>CERIF.Workshop</rdfs:label> <rdfs:comment /> <oiled:creationDate>16:19:57 07.08.2001</oiled:creationDate> <rdfs:subClassOf> <daml:Class rdf:about="http://derpi.tuwien.ac.at/~andrei/cerif-rdf-dc-mn.daml#CERIF.Event" /> </rdfs:subClassOf> </daml:Class>
DAML + OIL • Easy creation of custom vocabularies based on shared vocabularies • Easy specification of which classes (multiple classes possible) instantiate given object
DAML + OIL • Example: • Publications database: classes for researchers: Dissertation, Conference article, Journal article, Journal with evaluations, Patent • Classes for university administration: • Class A (score 2): International Patent, • Class B (score 1): Journal Article in International journal which is Journal with Evaluation
DAML + OIL • Created hierarchy of slots what makes information retrieval more clear and easy to implement Example: full-text search operations based on “full-text description” slot (attribute) project_abstract, project_title, project_desription are subslots of “full-text description” If new slot added “project_last_year_summary” to include it nto full text search it would be enough tp specify it as a subslot of “full-text description”
DAML + OIL Example of class hierarchy: from extended CERIF
RDF • DAML + OIL specifies schema. Also possible to encode data (“instances”) in DAML • For EuroCRIS we propose use RDF as encoding format • RDF description should be consistent with DAML + OIL Schema
RDF • Developed a toolset to export/import data CERIF database <-> CERIF RDF • Toolset to query CERIF RDF data (now very simple information retrieval operation but distributed and with semantic) • Toolset to get data from CERIF RDF and put into Prolog knowledge base is beeing developed
Current work • RDF version of CERIF-2000. Knowledge Management solution for research but data store is RDF • New advanced information retrieval possibilities for CERIF
Proposal • For testing try to use DAML + OIL and RDF for data sharing and distributed retrieval operation between different EuroCRIS organization • Create and deploy advanced IR solution based on CERIF RDF and compatible with any CERIF database. Make it free and a par of CERIF implementation