730 likes | 744 Views
Learn about Semantic Sky, a software platform based on semantic web technologies that provides a unified and simple approach to integrating and sharing data across different cloud services.
E N D
Semantic Sky:Cloud services integration using semantic web technologies Prof. DimitarTrajanov 08 Jun 2011, CERTH, Greece
Agenda • Introduction: • Semantic web technologies basics • Semitic sky architecture • Examples • Conclusion
Cloud computing • Cloud computing refers to the on-demand provision of computational resources (data, software) via a computer network • Cloud Computing Stack • SaaS - Software as a Service • PaaS - Platform as a Service • IaaS - Infrastructure as a Service
The information we work with in our every day live • Obtained from different sources: • Web (Facebook, Twitter, …) • Intranet (e-mail, Enterprise applications, …) • Local data (local documents, …) • The number of information sources is increasing rapidly • Increased number of publicly available services • Increasing number of cloud services with specialized functionalities • Increased number of enterprise application • Depending on information type, we mainly take some actions, e.g. we share them or add them into a ToDo list
The problem • Interchange data among information sources • Need of complex and composite actions • Actions require a certain amount of time (get/copy the data, change the context, transfer the data, execute an action in destination service) • Services and the data are placed on different locations and infrastructures
Motivation • To develop a software platform which will provide the users with a unified and simple composite approach to the different services they use, and with a simple flow of information from one infrastructure to another. • To come to such a design, a large number of partial problems will have to be solved • Mechanisms for detection of the entities which are found within texts and information that we get from different services. • Based on the context in which the user is working, to offer actions (services) that can be performed on the entities. • Integration with local working environment of the user.
Solution: Semantic Sky • The system is called “SemanticSky”, because it is an environment where many cloud services will exist and interact with each-other • It is based on semantic web technologies • Reuse of known ontologies (FOAF, AIISO, University Ontology,GeoNames, …)
Related work • There are projects that are focused on the connectivity of different cloud infrastructures (mOSAIC, SITIO, …) • Microsoft Outlook plug-in • Xobni ffers fast search and people-based navigation of email archives. • Mashin organizes information extracted from email history contextually. • Google mail plug-in
Related work • Babylon-Enterprise is a web-configured client-server system based on a Windows program (Babylon- Enterprise Client) installed on the end-user’s workstation and an enterprise application server (Babylon-Enterprise Server). • Gives the ability to access all enterprise information and data from every working environment. • Greplin is a personal search engine that allows you to search all your online data in one place.
A Semantic Web Primer A Layered Approach • The development of the Semantic Web proceeds in steps • Each step building a layer on top of another Principles: • Downward compatibility • Upward partial understanding
A Semantic Web Primer Current Semantic Web Stack
Semantic Web Open Standards • RDF – Store data as “triples” • OWL – Define systems of concepts called “ontologies” • Sparql – Query data in RDF • SWRL – Define rules • GRDDL – Transform data to RDF
Predicate Subject Object RDF “Triples” • the subject, which is an RDF URI reference or a blank node • the predicate, which is an RDF URI reference • the object, which is an RDF URI reference, a literal or a blank node Source: http://www.w3.org/TR/rdf-concepts/#section-triples
RDBMS vs Triplestore Person Table S P O Subject Predicate Object 001 isA Person 001 firstName Jim 001 lastName Wissner 001 hasColleague 002 002 isA Person 002 firstName Nova 002 lastName Spivack 002 hasColleague 003 003 isA Person 003 firstName Chris 003 lastName Jones 003 hasColleague 004 004 isA Person 004 firstName Lew 004 lastName Tucker f_name jim nova chris lew ID 001 002 003 004 l_name wissner spivack jones tucker Colleagues Table SRC-ID 001 001 001 001 002 002 002 002 003 003 003 003 004 004 004 004 TGT-ID 001 002 003 004 001 002 003 004 001 002 003 004 001 002 003 004
Merging Databases in RDF is Easy S P O S P O S P O
Ontologies • The term ontology originates from philosophy • The study of the nature of existence • Different meaning from computer science • An ontology is an explicit and formal specification of a conceptualization • Ontologies provide a shared understanding of a domain (semantic interoperability) • overcome differences in terminology • mappings between ontologies • There are many available onotologies for different domains
A Semantic Web Primer Typical Components of Ontologies • Terms denote important concepts (classes of objects) of the domain • e.g. professors, staff, students, courses, departments • Relationships between these terms: typically class hierarchies • a class C to be a subclass of another class C' if every object in C is also included in C' • e.g. all professors are staff members
Further Components of Ontologies • Properties: • e.g. X teaches Y • Value restrictions • e.g. only faculty members can teach courses • Disjointness statements • e.g. faculty and general staff are disjoint • Logical relationships between objects • e.g. every department must include at least 10 faculty
System Overview Sparql Endpoint Semantic Sky Resource Retrieval Ontology Desktop-client Cloud Service 1 Semantic Sky Cloud Service 2 Action Invocation Cloud plug-in Cloud Service 3 Browser Plug-in
Knowledge base • RDF data store • Apache Lucene is used as indexing engine • Each triple (statement), rdf:class and rdf:property are indexed as separated entities (Lucene Document) • Extensible
Knowledge base extension • Owl/Rdf file upload • Using Jena API to extract semantic resources • Calls the indexer to index resources • Owl/Rdf URI • Paste the link to the Owl or Rdf document • Jena extract the resources and passes them to the Lucena indexer • SPARQL endpoints • By providing a URL to the endpoint • Connect to the endpoint address and fetch all data • Lucene indexes the fetched data from the endpoint
Web service repository • Used for faster service discovery • Semantically annotated web services • Service input types • Service output types • Any ontology can be used for annotation of the web services • Extensible
Extending the WS Repository • Using existing web services • Annotating using the SAWSDL standard • Annotation tool developed • Import the annotated WSDL file into the repository • Creating new web services • Develop the web service • Repeat the steps for existing web services • Using REST web services • Tool for semantic mapping of REST services (in progress)
Extensibility in action • We have system that enables Task Management and exports web services for this. • We want to add new functionality about Task Management. • What do we do to enable this? • Import the ontology for this domain, if there is no any • Annotate the services (Preferably with our tool) • Define actions • It is on and can be used
Data Linking and Inference Engine • System entry point • Accepts text • Return semantic resources correlated with the text • For each token (word) in the text, we extract all resources related to it • Extraction is made using Apache Lucene Search • All Lucene entities retrieved from the search are converted to semantic resources
Data Linking and Inference Engine Ontology index Data Linking and Inference Engine Find the resources for the text from the index Ontology For each resource, get its properties SPARQL endpoints Group resources by type Type : [ {p1:v1,p2:v2,..,uri:#res1}, {p1:v1',p3:v3',..,uri:#res2} ]
Action Search Action Search Semantic WS Repository Type : [ {p1:v1,p2:v2,..,uri:#res1}, {p1:v1',p3:v3',..,uri:#res2} ] Align resource types as inputs Find all operations from the Repository for these inputs Are there entries in the repository Semantically annotated web services no yes Find all compositions with these inputs and store them in the repository Assemble action XML result <Action> <id>uid</id> <inputs>....</inputs> </Action>
Operations Retrieval • Searching operations (web service methods) from the repository • Service compositions are made when possible • Uses the types of the extracted resources to find the operations • User_Defined_Input • rdfs:Class used to denote that this input will be rendered as input text at the client side • Implicit input type • it will be placed in the inputs list, even when no resource from this type is extracted • The user must provide the value for this type
Action Form User_Defined_Input
UI Generator <Action> <id>uid</id> <inputs>....</inputs> </Action> Type : [ {p1:v1,p2:v2,..,uri:#res1}, {p1:v1',p3:v3',..,uri:#res2} ] UI Generator Find transformer for Resource type Transformers Transform the resource Transform the actions
Action Invocation • The generated form contains all parameters for action invocation • Single service for action invocation • It assembles the parameters and invokes the actual services • The result is returned back to the user
Implementation detailsData integration - Enterprise data - Opening the data
Example:University data • Most of today information systems (IS) store their data in relational databases • This data is published in a structured way, in RDF format on the Semantic Web • What we publish? basic information about the Faculties and deeper information about our CSE Faculty (Institutes, Modules, Programs, Courses, Subjects, Employees) • Few universities, most of them in the UK, have already started open data projects, which are still in development
Semantic data publishing • There are many tools for publishing the content of relational databases on the Semantic Web like D2R Server, Oracle Spatial 11g, Asio Semantic Bridge, SquirrelRDF and many others • We use the D2R Server • D2R Server enables RDF and HTML browsers to navigate the content of the database,and allows applications to query the database using the SPARQL query language • http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/
Open Linked data • Our goal is five star data – data linked to other people’s data to provide context • We connect to well known ontologies, which already have definitions for our types of data • D2RQ Mapping Language is a declarative mapping language for describing the relation between an ontology and an relational data model
Ontologies • For describing our data we need few well known ontologies • The Web Ontology Language (OWL) • FOAF - ontology describing persons, their activities and their relations to other people and objects, it is used for the employees • The Academic Institution Internal Structure Ontology (AIISO) - provides classes and properties to describe the internal organizational structure of an academic institution • University Ontology – same purpose as AIISO, but contains some additional features needed for describing our data • GeoNames Ontology - makes it possible to add geospatial semantic information to the Word Wide Web
Sparql Endpoint • Changes in the mapping .n3 file for connecting with the ontologies have to be made manually • After the .n3 file is edited, it can be run with D2R Server and in the Sparql Endpoint, queries can be written using the prefixes from the ontologies • The Sparql Endpoint shows the data in triples: subject, predicate and object • http://e-tech2.feit.ukim.edu.mk/open-data/snorql/
University Open Data Example • This query shows basic information about the Professor Trajanov and the courses he teaches • http://e-tech2.feit.ukim.edu.mk/open-data/snorql/?describe=http%3A%2F%2Fe-tech2.feit.ukim.edu.mk%2Fopen-data%2Fresource%2Fdbo.EMPLOYEES%2F64
University Open Data Example • This query shows basic information about the subject Network Programming and the courses of that subject • http://e-tech2.feit.ukim.edu.mk/open-data/snorql/?describe=http%3A%2F%2Fe-tech2.feit.ukim.edu.mk%2Fopen-data%2Fresource%2Fdbo.SUBJECTS%2F1
D2R Server Mapping tool • Manually editing the .n3 file is time consuming, so we created application called the D2R Server Mapping Tool to connect to the ontologies • The user first enters the database which wants to be published, then the application generates .n3 file using the D2R server • The Mapping tool then converts the .n3 file to .rdf file, format which can be easily shown in a visual xml-alike tree • The user can choose some class or property from the tree and just add or remove reference from an ontology • Ontologies can also be added and removed from the application
Implementation details Data access Cloud plug-in Desktop application
Google Gadgets Embed application's UI into Gmail, Calendar, Spreadsheets and Sites, using the OpenSocial standard
What are Gmail Gadgets? • Custom HTML & JavaScript components • Run within an iframe • Extend Gmail with additional functionality • Implement the Google gadgets API • Two types of Gmail Gadgets • Sidebar Gadgets • Contextual Gadgets