1 / 14

Tool Support for Data Validation by End-User Programmers

Tool Support for Data Validation by End-User Programmers. Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University. Target audience: End-user programmers. In 2012, in American workplaces 90 million computer end users 55 million of whom will create Spreadsheets Databases

Download Presentation

Tool Support for Data Validation by End-User Programmers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tool Support for Data Validationby End-User Programmers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University

  2. Target audience:End-user programmers • In 2012, in American workplaces • 90 million computer end users • 55 million of whom will create • Spreadsheets • Databases • Web applications Introduction Topes  Demonstration  Conclusion

  3. An input validation problemobserved during contextual inquiry Valid? “EDSH 225” Questionable? “EDXH 225” Valid but wrong format? “Smith 225” Or obviously invalid? “Robotics Institute” Introduction Topes  Demonstration  Conclusion

  4. Underlying problem: abstraction mismatch • Tools support strings, ints, floats, sometimes dates. • Problem domain involves higher-level categories: • University names • Person names • CMU phone numbers • CMU room numbers • These data categories are: • Short human-readable strings • Multi-format • Sometimes ambiguous (non-binary scale of validity) • Often particular to certain groups of people Introduction Topes  Demonstration  Conclusion

  5. Limitations of existing approaches • Types do not support questionable values • Grammars do not, either, nor can they reformat • Information extraction algorithms rely on grammatical cues that are absent during validation • Cues, Forms/3, -calculus, Slate, pollution markers, etc, infer numerical constraints but not constraints on strings, nor are they platform-independent Introduction Topes  Demonstration  Conclusion

  6. New Approach: Topes • A tope = a platform-independent abstraction describing how to recognize and transformstrings in one category of data • Greek word for “place,” because each corresponds to a data category with a natural place in the problem domain Introduction Topes Demonstration  Conclusion

  7. A tope is a graph.Node = format, edge = transformation Notional representation for a CMU room number tope… Formal building name& room number Elliot Dunlap Smith Hall 225 Building abbreviation& room number EDSH 225 Colloquial building name& room number Smith 225 Introduction Topes Demonstration  Conclusion

  8. A tope is a conceptual abstraction.A tope implementation is code. • Each tope implementation has executable functions: • 1 isa:string[0,1] function per format, for recognizing instances of the format (a fuzzy set) • 0 or more trf:stringstring functions linking formats, for transforming values from one format to another • Validation function: (str) = max(isaf(str)) where f ranges over tope’s formats • Valid when (str) = 1 • Invalid when (str) = 0 • Questionable when 0 < (str) < 1 Introduction Topes Demonstration  Conclusion

  9. Today’s demonstration(using our latest version) • Create phone number tope • Infer boilerplate from examples • What are formats, parts, and constraints? • Label parts, add/fix constraints, test in tool • Validate spreadsheet data • Transform spreadsheet data • Reuse phone number tope • Create web application • Attach tope-based validator, configure, execute • Valid / invalid / questionable / valid-but-misformatted Introduction Topes Demonstration  Conclusion

  10. Contributions highlighted today • A model for data... • Short, human-readable strings • Ambiguous categories • Multiple formats • Implementation features: • Inference of customizable formats from examples • Soft constraints • Human-readable error messages • Validation code is reusable across platforms Introduction Topes Demonstration  Conclusion

  11. Other contributionsnot highlighted today • Validating with topes (quantitatively) improves… • Accuracy of validation • Reusability of validation code • Subsequent duplicate identification • Additional tool features: • Inter-tope reference (ie: “topes in topes”) • Whitelists • Various additional auto-transformation features • Overriding auto-transformation with JavaScript Introduction Topes Demonstration  Conclusion

  12. Validation and Tool Maturity • Expressiveness • Have implemented dozens of topes • Usability • Fast creation of accurate formats by users in study • Usefulness • Integrated w/ Excel, Visual Studio, and an XML library • Integrated by IBM & Univ. Nebraska into other tools Introduction Topes Demonstration  Conclusion

  13. Thank You… • To Jeff Magee, Betty Cheng, Barbara Ryder, Margaret Burnett, and others at ICSE 2007 for early feedback • To NSF for funding • To ICSE 2008 for this opportunity to present Introduction Topes Demonstration  Conclusion

  14. Available for download http://www.cs.cmu.edu/~cscaffid/software.shtml Or Google for "Topes SDK" Introduction Topes Demonstration  Conclusion

More Related