200 likes | 344 Views
Efficiently Incorporating User Feedback into Information Extraction and Integration Programs. Xiaoyong Chai, Ba-Quy Vuong, AnHai Doan, Jeffrey F. Naughton University of Wisconsin-Madison. The Need for Incorporating User Feedback. Panels Chair. Current Approach. Code. Data. …. 3.
E N D
Efficiently Incorporating User Feedback into Information Extraction and Integration Programs Xiaoyong Chai, Ba-Quy Vuong, AnHai Doan, Jeffrey F. Naughton University of Wisconsin-Madison
The Need for Incorporating User Feedback Panels Chair
Current Approach Code Data … 3
This Is Not Just For DBLife • A growing number of applications use IE and II • Avatar@IBM Almaden • AliBaba@Humboldt Univ. of Berlin • YAGO@MPI • Kylin@Univ. of Washington • … • A systematic user-feedback solution could significantlybenefit them 4
What User Feedback To Incorporate? Types of User Feedback Flagging an Error Fixing an Error Editing Code Editing Data Input Output IntermediateResults 5
Challenges • How to expose program data for user feedback? • How to incorporate user feedback? • How to efficiently execute a program? 6
Exposing Program Data for User Feedback name conf role name conf role … … … Joe Hellerstein CIDR 2009 PC Chair … … … name role page … … … name role page … … … url date url … http://.../cidr09/ 09/01/2008 … … • Extracting conference services User Interfaces Views services Wiki roles findRoles extractConf Spreadsheet extractNames crawl Form dataSources 7
Writing User-Feedback Rulesto Expose Program Data Write extraction program, e.g., in xlog [Shen et al, 07] R1: pages(page) : dataSources(url, date), crawl(url, page) R2: conferences(conf, page): pages(page), extractConf(page, conf) R3: names(name, page) : pages(page), extractNames(page, name) R4: roles(name, role, page) : names(name, page), findRoles(name, page, role) R5: services(name, conf, role) : conferences(conf, page), roles(name, role, page) • Write user-feedback rules to specify views and user interfaces #form-UI R6: dataSourcesForUserFeedback(url): dataSources(url, date), date >= “01/01/2009” R7: rolesForUserFeedback(pos, page#no-edit)#spreadsheet-UI : roles(role, page) R8: servicesForUserFeedback(name, conf, role)#wiki-UI : services(name, conf, role) 8
Program Semantics User Interfaces Views services name conf role name conf role Wiki … … … Joe Hellerstein CIDR 2009 PC Chair … … … roles name role page … … … name role page Spreadsheet … … … extractConf extractNames findRoles crawl url date url Form … http://.../cidr09/ 09/01/2008 … … dataSources 9
Incorporating Previous User Feedback p p tt’ O O O’ I I Interpretation: for operator p, if t is in the output, change t into t’ Change “A. Smith” to “D. Smith” extractNames extractNames … D.Smith, A.Jones, ... Dr. A. Smith is ...… … 10
Interpreting User Feedback Based On Tuple Provenance page p1 p2 • Provenance of output tuple t : • the set of input tuples that operator p used to produce t Change “A. Smith” to “D. Smith” p1 p1 p1 p1 p2 If the operator produces {“A. Smith”, “A. Jones”} from {p1}, extractNames extractNames then replace{“A. Smith”, “A. Jones”} with {“D. Smith”, “A. Jones”} 11
Challenges • How to expose program data for user feedback? • How to incorporate user feedback? • How to efficiently execute a program? • Incremental execution • Improved concurrency control 12
Incrementally Executing the Program extractNames(I+I) = extractNames(I) + extractNames extractNames extractNames(I) ? name … page page p1 p1 p2 p2 p3 • Similar problem in incremental view maintenance • Incremental-update properties • Closed-formed insertion • Closed-formed deletion • Input partitionability • Partition correlation • Attribute independence 13
Concurrently Executing Transactions name conf role Joe Hellerstein CIDR 2009 PC Chair … … … name role page … … … url date http://.../cidr09/ 09/01/2008 … … services Operator-Skipping Skips executing the join operator after updating the roles table roles T2 findRoles extractConf extractNames Table-Locking Locks only the input and output tables of the crawl operator crawl T1 dataSources 14
Experiment Setup • Testbed • A 5-stage DBLife workflow • 13 blackbox operators: 6 IE operators and 3 II operators • Wrote xlog program and user-feedback rules in < 1 hr • Simulated user-feedback transactions • On each stage of the workflow • Each transaction randomly deletes, inserts, or modifies1/10 of the tuples in a table 15
Table-Locking and Operator-Skipping Improve Concurrency Degree Increase transaction throughput by 50% and 500% • Reduce transaction response time by 43% and 98% -43% -98% 18
Related Work User feedback in IE and II [Doan et al, 01], [Chiticariu et al, 08], [Jeffery et al, 08] Leveraging user feedback to improve results of individual operations Provenance [Woodruff & Stonebraker, 97], [Cui & Widom, 01], [Buneman et al, 01], [Bohannon et al, 08] ], [Huang et al, 08] Incremental execution View maintenance [Blakeley et al, 86], [Griffin & Libkin, 95], [Gupta & Mumick, 95] Schema matching [Bernstein et al, 06], IE [Chen et al, 07] 19
Conclusions and Future Work Incorporating user feedback into IE and II programsis important Identify key issues and provide initial solutions: Write user-feedback rules to expose program data to UIs Model and incorporate user feedback Efficiently execute program to process user feedback Future work: Handle unreliable user feedback Propagate user feedback down in the workflow Conduct user study 20