270 likes | 410 Views
Prof. Jason Hong, Carnegie Mellon University Rapid End-User Programming and Visualization for the Web IDA Session 5 2007 CS Study Panel 24 April 2008. Principal Investigator. Principal Investigator. Research Areas End-User Programming Extracting and visualizing data from web
E N D
Prof. Jason Hong, Carnegie Mellon UniversityRapid End-User Programming and Visualization for the WebIDA Session 52007 CS Study Panel24 April 2008
Principal Investigator Principal Investigator • Research Areas • End-User Programming • Extracting and visualizing data from web • Usable Privacy and Security • Anti-phishing (training, detection) • Managing privacy and security policies • Mobile Computing • Location-based services • Context-aware computing Jason Hong Assistant Professor Human-Computer Interaction Institute Carnegie Mellon University PhD: University of California, Berkeley Contact Information School of Computer Science Carnegie Mellon University 2504D Newell-Simon Hall 5000 Forbes Ave Tel: (412) 268 1251 Fax: (412) 268 1266 E-mail: jasonh@cs.cmu.edu Web: http://www.cs.cmu.edu/~jasonh • Potential Military Applications • Tools for rapidly integrating data and web services • Better visualizations of large data sets • Effective training for security • Automated algorithms for detecting phishing scams • Better interfaces for managing security
30000 Foot View • High-level problems observed: • Stovepipes - Data and services spread over multiple systems • Agility - Integration takes months or years • Overload - Too much information to easily process • Goal: Make it easy for people to visualize and process data gathered from variety of sources • Information extraction + visualization + machine learning • No PhD required • Analogies: • Spreadsheets • Visual Basic
Mashups as Key Focus Area • More specifically, provide an end-user programming tool that makes it easy to create mashups • Mashups are applications that combine content and services from multiple web sites • Ex. Craigslist.com + GoogleMaps = Housingmaps.com
Other Example Mashups • Other example mashups • Ex. MySpace child predators • Ex. Locations of friends on MySpace or Facebook • Common themes • Aggregating multiple sources (web pages, databases, etc) • Handling multiple data formats (not designed to be shared) • Processing the data (filtering, summarizing, etc) • Supporting multiple forms of output (graphs, maps, lists)
Creating Mashups is Difficult • Requires lots of skill to create a mashup • Ex. Housingmaps creator has PhD in computer science • Ex. MySpace predator list took months of custom coding • Requires programming expertise in many areas • Web crawling • Text parsing and pattern matching • Web services (WSDL and REST) • Databases • HTML • Can we accelerate this process to a matter of days or hours for non-experts?
End-User Programming • Haggis, an end-user programming tool • Rapidly extract and combine data from multiple sources • Quickly create high-quality interfaces and visualizations • Use programming-by-example techniques to specify what is normal and what is anomalous
1. Extract data from multiple sources • Improved wizards for extracting data from web pages • Can specify example of desired links, system generalizes
1. Extract data from multiple sources • Improved wizards for extracting data from web pages • Can specify example of desired links, system generalizes • Better support for other patterns on web • Tables, street addresses, etc • Support for real-time data • Weather, traffic, stocks, any web page periodically updated • Sensor Andrew, sensor network being deployed at CMU • Electrical usage, water usage, etc
2. Interfaces and Visualizations • Wizards for supporting common UI patterns • Table views, maps, graph views, alerts, etc • Programming-by-example techniques
2. Interfaces and Visualizations • Output as a web page or desktop widget • Yahoo Widgets, Google Desktop, Windows Sidebar
2. Interfaces and Visualizations • Output as a web page or desktop widget • Yahoo Widgets, Google Desktop, Windows Sidebar
3. Normal versus Anomalous • Problem: Too much data, gets dropped on floor • Solution: “Teach” the system what patterns to look for • Analyst-in-the-loop: infoviz + machine learning • Long-term goal • Example: • eBay “penny sellers”, could create custom software, but slow • Analyst uses visualization to find some examples of penny sellers and gives hints to system as to why • Systemfindsmore suspects,analystgivesrelevancefeedback • As new data streams in, system can flag suspects • Can help address high turnover rate at intelligence agencies, loss of organizational memory
Current Progress • First round of interviews completed • Sensor Andrew team (Civil and Electrical Engineers) • Mashup Camp • Programmers around CMU • Initial prototype of “plumbing” in progress • An Integrated Development Environment (IDE) for programmers, to facilitate extraction and visualization of data • Low-level support for extracting data from tables, basic visualizations, etc • Higher-level tools later to be built on top • First round of user tests planned for August
Past Work with Marmite • Wizard for extracting data from arbitrary web pages • Combine operators together in a dataflow (Unix) • View the data in multiple ways (table, map)
How Marmite Works • Wizard for getting data from web pages • Combine operators together in a dataflow (Unix) • View the data in multiple ways (table, map)
How Marmite Works • Operators let you knowwhat operations can be done • Input, processing, output
How Marmite Works • Operators are chained together in a dataflow (Unix)
How Marmite Works • Current data is shown
How Marmite Works • And multiple views too
How Marmite Works • A wizard UI for helping people get the data they want
Some High-Level Design Issues • Centralized model • Clean data model: well-managed, well-formatted, common representations, well-known databases, etc • Decentralized model • “Anarchic”, multiple data formats in multiple places • Hard to get lots of people to agree on data format and representation • More likely scenario (look at how databases are used today) • Haggis is being designed for this model, assuming that a person may have to clean up the data and resolve formats
Other High-Level Design Issues • Discovery • What data sources are available? • May need some kind of centralized store that describes these (sort of like DNS for Internet) • Security • Access control, who can access what data sources? • This is a general problem with sensor data • Privacy • What kinds of queries / apps should people be able to do? • Unclear how to restrict those in practice