640 likes | 792 Views
Realizing Interoperability of Heterogeneous Repositories. Daniel Olmedilla L3S Research Center / Hannover University Programa de Postgrado en Ingeniería Informática y de Telecomunicación (Máster y Doctorado) Universidad Autónoma de Madrid, 10 th April, 2008. Outline.
E N D
Realizing Interoperabilityof Heterogeneous Repositories Daniel Olmedilla L3S Research Center / Hannover University Programa de Postgrado en Ingeniería Informática y de Telecomunicación (Máster y Doctorado) Universidad Autónoma de Madrid, 10th April, 2008
Outline • Introduction and Motivation • Interoperability: what is it and why is it needed? • Common Query Interface • Common Metadata Schema • Ranking • Successful Interoperability Demonstrations • Conclusions & Open Issues Universidad Autónoma de Madrid
Outline • Introduction and Motivation • Interoperability: what is it and why is it needed? • Common Query Interface • Common Metadata Schema • Ranking • Successful Interoperability Demonstrations • Conclusions & Open Issues Universidad Autónoma de Madrid
IntroductionSimple Motivation Scenario (I) • Simple Scenario: • Alice is interested in learning about Windows and would like to attend a lecture about it this year Universidad Autónoma de Madrid
IntroductionSimple Motivation Scenario (& II) Universidad Autónoma de Madrid
IntroductionSearch Engine Limitations • Unstructured information and lack of semantics • Size and coverage of the Web • Hidden Web (also Deep Web) • Personalized Ranking Universidad Autónoma de Madrid
IntroductionOther Approaches: Coalitions • Repositories interconnected • Lack of standards, ad-hoc solutions • Individual agreement required to join • Approaches • Replication • Loose control over data sometimes undesirable • Federated Search • Lack of standards costly Universidad Autónoma de Madrid
IntroductionOther Approaches: P2P Networks • Advantages • Scalability • No single point of failure • Control remains with owners • Dynamicity • Disadvantages • Decrease on performance • Ad-hoc interfaces lack of interoperability Universidad Autónoma de Madrid
IntroductionA bit More Complex Motivation Scenario • Alice is a consultant and she has been asked to lead a project starting in two months. Now she needs to retrieve courses in order to • refresh and improve her previous knowledge on project management • get some basic knowledge about accounting and auditing • practice her advanced level of English Universidad Autónoma de Madrid
IntroductionProblem Statement • Lack of standards and appropriate integration solutions prevent users from easily and effectively finding relevant resources to their needs Universidad Autónoma de Madrid
Outline • Introduction and Motivation • Interoperability: what is it and why is it needed? • Definition • Why Interoperability? • Challenges to achieve it • Common Query Interface • Common Metadata Schema • Ranking • Successful Interoperability Demonstrations • Conclusions & Open Issues Universidad Autónoma de Madrid
Interoperability: What and Why? Exercise 1: simple questions • What is interoperability? • What does it mean two systems interoperate? • And at the information level? Universidad Autónoma de Madrid
Interoperability: What and Why? What is it? • Summary from existing definitions: • Ability of working together to accomplish a common task • Work in conjunction • Exchange of information and USE it • Provided at different levels • Without increasing the effort of the user • [Concise Oxford Dictionary, NISO, IEEE: Standard Computer Dictionary, DMReview, Whatis.com] Universidad Autónoma de Madrid
Interoperability: What and Why? Interoperability encompasses … • Technical Interoperability • Semantic Interoperability • Political Interoperability • Inter-community Interoperability • Legal Interoperability • International Interoperability Universidad Autónoma de Madrid
Interoperability: What and Why? Investment in Technology • ICT Gobally • $1,45 trillion annually • Technology in Europe • €6,4 billion in 2004 • Increasing (10% more than previous year) • [Money for Growth, The European Technology Investment Report 2005. PricewaterhouseCoopers Report, Jun. 2005] Universidad Autónoma de Madrid
Interoperability: What and Why? Key Technological Issues (I) • 38 industry associations in 27 different countries • The most significant technology issues … included • Integration (21%) • Standards (20%) • [International Survey of E-Commerce. World Information Technology and Services Alliance (WITSA), 2000] Universidad Autónoma de Madrid
Interoperability: What and Why? Key Technological Issues (& II) • [International Survey of E-Commerce. World Information Technology and Services Alliance (WITSA), 2000] Universidad Autónoma de Madrid
Interoperability: What and Why? Interoperability Inhibited by Cost • “Although interoperability is a significant strategic direction, it is often inhibited by cost” • [Survey: Integration costs still hamper agility. Computerworld Today, February 2006] Universidad Autónoma de Madrid
Interoperability: What and Why? User Effectiveness: Some Facts • User Effectiveness • Knowledge workers spend from 15% to 35% of their time searching for information • Searchers are successful in finding what they seek 50% of the time or less • Total Lost • not finding the right information: estimated among $2.5 to $3.5 million per year for an enterprise with 1000 knowledge workers • opportunity cost: potential additional revenue of $15 million annually • [Feldman. The high cost of not finding information. IDC White Paper & KMWorld Magazine, 2004] Universidad Autónoma de Madrid
Interoperability: What and Why? Challenges to achieve it Universidad Autónoma de Madrid
Interoperability: What and Why? E-Learning Study Analysis: Technical Requirements • Training-life-cycle in companies across Europe • Retrieving learning services from a wide variety of providers • Search heuristics • Metadata queries • Matching skill gaps with learning service selections • Matching personal development gaps with learning services • [Gunnarsdottir. User Trials – Evaluation Report. EU IST ELENA Deliverable, May 2005] Universidad Autónoma de Madrid
Outline • Introduction and Motivation • Interoperability: what is it and why is it needed? • Common Query Interface • Simple Query Interface • Opening P2P to the rest of the World • Common Metadata Schema • Ranking • Successful Interoperability Demonstrations • Conclusions & Open Issues Universidad Autónoma de Madrid
Common Communication Interface Simple Query Interface (SQI) • Simple but Highly flexible: targets different interoperability scenarios • Official CEN/ISSS Workshop Agreement since October 2006 • Listed by IMS on Query Services • Widely adopted in E-Learning community Universidad Autónoma de Madrid
Common Communication Interface Simple Query Interface: Design Issues • Independent of query language, result format and vocabularies • Complex information sources may be queried (e.g., P2P networks) • Synchronous and asynchronous • Support for Lightweight implementations • Stateful and stateless • Access-control and search separation • Easy extensibility Universidad Autónoma de Madrid
Common Communication Interface Simple Query Interface: Session Management • Authentication/authorization are requirements • Independent of the search interface • Separation is managed via sessions • session createAnonymousSession () • session createSession (user, passwd) • destroySession (sessionId) • Other different methods are allowed (e.g., based on credentials or trust negotiations) Universidad Autónoma de Madrid
Not a member? Common Communication Interface Traditional Access Control in Decentralized Systems • Assumption: I already know you---you have a local account! Universidad Autónoma de Madrid
Common Communication Interface Trust Negotiation: Features • Trust is based on parties’ properties • Every party can define access control policies to control outsiders’ access to their sensitive resources • Establish trust iteratively and bilaterally by the disclosure of certificates and by requests for certificates Universidad Autónoma de Madrid
Step 1: Alice requests a service from Bob Step 2: Bob discloses his policy for the service Step 3: Alice discloses her policy for VISA Step 4: Bob discloses his BBB credential Step 5: Alice discloses her VISA card credential Step 6: Bob grants access to the service Service Common Communication Interface Trust Negotiation: Example Alice Bob Universidad Autónoma de Madrid
Common Communication Interface Simple Query Interface: Query (I) Universidad Autónoma de Madrid
Common Communication Interface Simple Query Interface: Query (& II) Universidad Autónoma de Madrid
Common Communication Interface P2P Proxying Architecture • [Brunkhorst, Olmedilla. Interoperability for peer-to-peer networks: Opening P2P to the rest of the World. EC-TEL, Oct 2006] Universidad Autónoma de Madrid
Outline • Introduction and Motivation • Interoperability: what is it and why is it needed? • Common Query Interface • Common Metadata Schema • Learning Resource Schema • Competence Modeling • Ranking • Successful Interoperability Demonstrations • Conclusions & Open Issues Universidad Autónoma de Madrid
Common Metadata SchemaData Integration Global as View Local As View Universidad Autónoma de Madrid
Common Metadata SchemaData Integration • Given a query reformulating it in terms of the sources • Is easier in GAV (just needs unfolding of the query) • Is harder in LAV • Adding a new source • Supposedly easier in LAV (just need to express the new source as a view of the global schema) • Harder in GAV (as the global schema needs to be revised) Universidad Autónoma de Madrid
Common Metadata SchemaSimple Learning Resource Schema Universidad Autónoma de Madrid
Common Metadata SchemaComplex Learning Resource Schema Universidad Autónoma de Madrid
Common Metadata SchemaCompetence Requirements • Excerpt extracted from a newspaper • Complete Master’s Degree (any faculty) • Expert knowledge in Java J2EE, Servlets, JSP) • Very good IT English and / or Spanish • Drawbacks • Does not indicate what is mandatory or optional • It is not machine-understandable Universidad Autónoma de Madrid
Common Metadata SchemaCompetence Definition • “an effective performance within a domain / context at different levels of proficiency” • Example: Competency “English Language”, Level “Advanced”, Context ”Computer Science” Universidad Autónoma de Madrid
We use IEEE RCD to represent a Competency Uniquely identify an isolated competency Enriched with human-readable titles and descriptions Common Metadata SchemaCompetency Universidad Autónoma de Madrid
Reusable scales of totally ordered proficiency levels Each level is identified by an ID, a human-readable label and an optional mapping to a numerical domain Common Metadata SchemaProficiency Level Universidad Autónoma de Madrid
“... the interlaced conditions in which something exists or occurs” Competences might be interpreted differently in a different context Context are defined in tree-like hierarchies Easier to model and to handle Simpler algorithms, no cycle detection necessary May optionally link to additional ontologies Common Metadata SchemaContext Universidad Autónoma de Madrid
Common Metadata SchemaCompetence • Links to the dimensions objects • High degree of reusability • Better support for gap analysis • Competences can be simple or composed of other (arbitrary nested) competences • Aggregation • Set Selection Universidad Autónoma de Madrid
Common Metadata SchemaA bit More Complex Motivation Scenario (Revisited) • Alice is a consultant and she has been asked to lead a project starting in two months. Now she needs to retrieve courses in order to • refresh and improve her previous knowledge on project management • get some basic knowledge about accounting and auditing • practice her advanced level of English Universidad Autónoma de Madrid
Outline • Introduction and Motivation • Interoperability: what is it and why is it needed? • Common Query Interface • Common Metadata Schema • Ranking • Link-based Personalized Ranking Platform • Successful Interoperability Demonstrations • Conclusions & Open Issues Universidad Autónoma de Madrid
RankingPageRank • Page score based on the link structure of the web • It measures page popularity • page i pointing to page j means vote from i to j • The more backlinks a page has, the more important it is • Sum of the ranks of the backlinks Universidad Autónoma de Madrid
RankingPageRank Example Universidad Autónoma de Madrid
RankingPageRank Personalization • It has a personalization vector • Computationally expensive: not possible to make the whole computation for each user Universidad Autónoma de Madrid
RankingPersonalized PageRank • Hubs: pages pointing to many important pages • Compute one Personalized PageRank Vector for each user (PPV) • Challenges: • Reduce storage required • Reduce time for computation • Each PPV corresponding to a Preference Set P can be expressed as a linear combination of Basis Hub Vector • Decomposes each Basis Hub Vector in two parts: • Hub skeleton vector (common interrelationships and precomputed) • Partial vector (unique values and computed at construction-time) Universidad Autónoma de Madrid
RankingPersonalized PageRank Limitations • Personalization relies on user’s ability to choose a good Preference Set • High quality hubs which match his preferences • This process can be automated: • Information collected from the user can be used to derive his Preference Set • User does not even need to know what is a hub Universidad Autónoma de Madrid
RankingA Personalized Ranking Platform (I) • Personalization relies on user’s ability to choose a good Preference Set • High quality hubs which match his preferences • This process can be automated: • Information collected from the user can be used to derive his Preference Set • User does not even need to know what is a hub Universidad Autónoma de Madrid