430 likes | 542 Views
Making Mashups with Marmite. Jeff Wong Jason I. Hong Carnegie Mellon University. The Big Picture Problem. Lots of content out there on the web But not always in a form amenable to your needs Ex. Easy to get a list of hotels in San Jose, not so easy to sort by distance to convention center
E N D
Making Mashups with Marmite Jeff WongJason I. HongCarnegie Mellon University
The Big Picture Problem • Lots of content out there on the web • But not always in a form amenable to your needs • Ex. Easy to get a list of hotels in San Jose, not so easy to sort by distance to convention center • Two observations: • In many cases, all of the data and services people need already exist, but not connected together • Unlikely that a web site can predict all possible needs
A Solution: Mashups • Rapidly growing community of users creating “mashups” combining content from multiple web sites • Ex. Housingmaps.com
A Solution: Mashups • Rapidly growing community of users creating “mashups” combining content from multiple web sites • Ex. Housingmaps.com • Ex. MySpace child predators • Ex. Friendster locations • Ex. Most popular videos on YouTube, Yahoo Video, …
A Solution: Mashups • Rapidly growing community of users creating “mashups” combining content from multiple web sites • Ex. Housingmaps.com • Ex. MySpace child predators • Ex. Friendster locations • Ex. Most popular videos on YouTube, Yahoo Video, … • ProgrammableWeb.com statistics • ~1500 mashups created since April 2005 • 356 open web-based APIs available
But Creating Mashups is Hard • Requires lots of skill to create a mashup • Ex. Housingmaps creator has PhD in computer science • Ex. MySpace child predator list took months • Requires programming expertise in many areas • Web crawling • Text parsing • Pattern matching • Databases • HTML
MarmiteEnd-User Programming for Mashups • Main idea: make it easy to create web mashups • Use a dataflow approach connecting small operators • Inspired by Unix pipes and Apple’s Automator • Example: • Get all events from Upcoming.org • Filter out events that are too old • Put them all onto a map • Runs inside of a standard web browser
Using Marmite (Envisioned) • Extract content from one or more web pages • names, addresses, dates, phone #, URLs • Process it in a data flow manner • filtering out values or adding metadata • integrating with other data sources (similar to a database join operation) • Direct the output to a variety of sinks • databases, map services, text files, visualizations, web pages, or source code that can be further edited
Marmite • Motivation and Examples • Features and Design Rationale • User Evaluation
Features and Design Rationale • Conducted a series of quick evaluations to understand design space and potential problems • Automator • Lo-fi prototypes
Informal Automator Evaluation • Had three novices try three simple web-based tasks • Warm-up task • Traverse a set of web pages • Download a set of images • Some findings: • Some difficulties knowing how to start and what to do next • Little feedback about state of system between operations • Difficult to iterate due to network speed issues
Lo-Fi Prototypes • 6 paper prototypes with 20 participants
Design Solutions • Problem: how to start and what to do next • Solution: Suggest next actions • Weak data typing to find types (addresses, numbers, etc) • Filter operators to only show relevant ones • Suggest operators that might be applicable
Design Solutions • Problem: little feedback about state of system between operations • Solution: link data flow and data view together • Many systems take program-centric view (ex. Automator) or data-centric view (ex. spreadsheets) • Use hybrid data flow / data view, showing an operation and its effects together • Data view usually “spreadsheet”, other views possible too (for example, maps)
Design Solutions • Problem: difficult to iterate due to network speeds • Solution: cache data, let people “replay” data • Reload, pause, play
Other Design Findings • Screen real estate issues • Collapsible operators, leaving a readable label
Extracting Generic Content • Can’t have pre-defined extractor operators for every possible web site • Need a more general way of extracting data from pages • Developed a generic wizard UI for selecting links • Content from that set could be extracted via other operators • Uses Solvent (MIT), an XPath-based algorithm for finding patterns in web pages • Finds “groups” of related web content based on how HTML is structured
Operators • Operators have input types • Operator uses this to guess which columns it wants • Operators have output types
Implementation • JavaScript (for underlying code) and Extensible Binding Language (XBL for UI) • Operators currently in JavaScript • Ideally could be scriptable in any programming language • Currently ~15 operators
Marmite • Motivation and Examples • Features and Design Rationale • User Evaluation
Evaluation • Informal user study with 6 people • 2 novices • 2 people with spreadsheet experience (formulas) • 2 people with programming experience • Tasks (in increasing difficulty) • Warmup task showing how to retrieve a set of addresses and how to geocode an address • Search for and filter out events further than a week away • Compile a list of events from two event services and plot them on a map • Recreate the housingmaps site
Results • Three people able to complete all tasks in ~1 hour • First two users confused about suggested actions (automatically popped up, made manual for other 4 users) • Novice made some progress, not able to finish all tasks • Able to re-create housingmaps in ~15 minutes
More Results • Biggest barrier was understanding the data flow • Did not understand input and output concept • Applied operators as one-off, did not realize that it was a static representation of flow • Did not understand data flow and data view were linked
Future Directions • Short-term • Better screen-scraping operators • More operators • Better connection with web services (WSDL and REST) • Better help for starting a data flow • Long-term • Intelligence analysis • Better visualizations • Location-based services
Conclusions • Marmite, a tool for creating web-based mashups • Extract content from one or more web pages • Process it in a data flow manner • Direct the output to a variety of sinks • Hybrid data flow / data view • User evaluation shows some promising results Jeff Wong, Jason Hong, Making Mashups with Marmite: Re-purposing Web Content through End-User Programming, CHI 2007
Types of Operators • Sources • Add data into Marmite by querying databases, extracting information from web pages, and so on. • Processors • modify, combine, or delete existing rows. Example operators include geocoding (converting street addresses to latitude and longitude) and filtering. Processor operators might add or remove columns as well • Sinks • redirect the flow the data out of Marmite. Examples include showing data on a map, saving it to a file, or to a web page.