Autocompletion for Mashups

Autocompletion for Mashups Ohad Greenshpan, Tova Milo, Neoklis Polyzotis Tel-Aviv University UCSC

Talk Roadmap • Introduction on Mashups and Autocompletion • Problem Definition • The Algorithm • Implementation & experiments • Conclusions & Related Work

Introduction - What is a mashup ? Mashupis a technology for integration of data, services and applications being available on the web, into a single application.

Mashup Platform GUI GUI GUI GUI GUI GUI GUI GUI Logic Logic Logic Logic Logic Logic Logic Logic Data Data Data Data Application Integration GUI Logic Data Data Data Data

Components Repository Choose some relevant components Decide which should be connected and learn their spec Glue 10 2 Mashup Repositories Mashup Development is difficult ...

knowledge ? knowledge

Introduction - Mashup Autocompletion

Glue Pattern API Data & logic Mashlets & Mashlet-APIs Mashlet Mashlet API API Data & logic Data & logic Mashlets & Mashlet-APIs Mashlets & Mashlet-APIs The Mashup Model

Inheritance B B A A

Mashup Autocompletion – Problem Definition Given a database of mashlets and GPs and a set of mashlets selected by the user, identify and rank GPs that link a subset of the selected mashlets. Based on: Popularity & Relevance to user query What would be the “ideal” GP: The mostpopular one that connects only the user mashlets and nothing else Relaxations: • Less popular • Connects variants of the user mashlets • Connects a subset of the user mashlets • Connects additional mashlets

Inheritance

0 0.4 0.3 0.2 0 1 0 0 0 0 0 0 0 0 0 0 . . . 0 0 1 0 1 0 . . . g A simplified 3D illustration Problem Abstraction • Each glue pattern is represented as a point in a multidimensional space. • One dimension representing the GP popularity • The rest: All mashlets • 1) User Mashlets • 2) Other mashlets • The algorithm goal is to find the top-k GPs that link the given user mashlets (the ones close to the optimal GP). GP Popularity m2 m1

Data Structure & Basic Top-k Algorithm GP Popularity Mashlets Glue Patterns

Problems with the algorithm • The number of lists the algorithm accesses is very large • Most of the mashlet lists are unrelated to the user selection (query)

Data Structure Mashlets GP Popularity User mashlets Glue Patterns

n n n M and pg’[m]=0 for n < m ≤ |Mall| Algorithm

Correctness of AC* - Lemma • Theorem 4.1:Algorithm AC* returns a correct solution Proof is based on a lemma showing that any candidate that has not been encountered by AC*, has a total score lower than the threshold. Optimality of AC* • Competing Algorithms: • C – class of deterministic algorithms that operate under the same access model as AC*. • Algorithms receive as input the lists, the monotonic function, and k. • Algorithms can use any order (i.e., not specifically round-robin) and any thresholding scheme, and can rely on accessed elements. • Instance Optimality: • AC* is instance optimal within class C if there are constants c and c0 such that for every input instance I, cost(AC*,I) ≤c·cost(A,I)+c0for any AC.

Calculating Popularity Glue Pattern and Mashlets Rank • Page-rank style algorithm • Takes into account popularity of mashlets and GPs, as well as relationship between them. GP GP GP M M GP M M

1 1 2 3 5 4 IBM Mashup Center Implementation Websphere Application Server Knowledge base MatchUp Algorithm

Experiments(synthetic dataset) Synthetic dataset for large-scale experiments • Generated a DB of 40k mashlets & GPs (ProgrammableWeb has 4k) • Based on ProgrammableWeb characteristics. Experiments for synthetic dataset • Varying # of total mashlets and GPs • Varying k • Varying # of user mashlets • Varying GP complexity

Results(synthetic dataset) GP Complexity = 5, varying k

Results (synthetic dataset) GP Complexity = 10, varying k

Results (synthetic dataset) Varying # of user mashlets

Experiments (real dataset) Real dataset • Used real-life mashlets from ProgrammableWeb and IBM Mashup Center • Scenario: development of a travel-related mashup Experiments for quality assesment • IBM Mashup Center as the mashup platform • Users placed mashlets • MatchUp offered top-10 GPs for their mashlets • Users searched for alternatives Results • User satisfaction was high • High correlation between suggestions and users’ lists • Browsing for additional results was in general unsuccessful • Gluing process was significantly expedited

Related Work • Autocompletion in many other domains • Phrase Prediction (Nandi & Jagadish, VLDB 2007) • File locations (Myers, CHI 2000) • Web service composition • Model for WS composition (Berardi et al., VLDB 2005) • Optimized and customized algorithm (Mcilraith and Son, KR 2002) • Mashup assembly tools • MashMaker (Ennals & Garofalakis, SIGMOD 2007) : data -> widgets • MashupAdvisor (Elmeleegy et al., ICWS 2008): mashup -> output recomm. -> assembly to achieve this output

Conclusions • A novel Autocompletion mechanism for rapid development of mashups • Using the collective wisdom of other users on the web • A dedicated Threshold-based top-k algorithm which reduces the search space • Pagerank-style calculation of mashlets and glue patterns popularity Future Work • Infer semantic inheritance automatically • Distributed environment • Incorporating context and user preference

Autocompletion for Mashups

Autocompletion for Mashups

Presentation Transcript

Mashups!

Standards for Semantic Sensor Mashups

Special Session: Mashups for Learning

Mashups :

Damia : Data Mashups for Intranet Applications

Web Mashups

Leadership Mashups

Damia: Data Mashups for Intranet Applications

Library Mashups

Damia: Data Mashups for Intranet Applications

Web Mashups

VoIP Mashups

Map Layer Mashups

Acoustic Mashups

Web Mashups

VoIP Mashups