140 likes | 214 Views
Tool Support for Data Validation by End-User Programmers. Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University. Target audience: End-user programmers. In 2012, in American workplaces 90 million computer end users 55 million of whom will create Spreadsheets Databases
E N D
Tool Support for Data Validationby End-User Programmers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University
Target audience:End-user programmers • In 2012, in American workplaces • 90 million computer end users • 55 million of whom will create • Spreadsheets • Databases • Web applications Introduction Topes Demonstration Conclusion
An input validation problemobserved during contextual inquiry Valid? “EDSH 225” Questionable? “EDXH 225” Valid but wrong format? “Smith 225” Or obviously invalid? “Robotics Institute” Introduction Topes Demonstration Conclusion
Underlying problem: abstraction mismatch • Tools support strings, ints, floats, sometimes dates. • Problem domain involves higher-level categories: • University names • Person names • CMU phone numbers • CMU room numbers • These data categories are: • Short human-readable strings • Multi-format • Sometimes ambiguous (non-binary scale of validity) • Often particular to certain groups of people Introduction Topes Demonstration Conclusion
Limitations of existing approaches • Types do not support questionable values • Grammars do not, either, nor can they reformat • Information extraction algorithms rely on grammatical cues that are absent during validation • Cues, Forms/3, -calculus, Slate, pollution markers, etc, infer numerical constraints but not constraints on strings, nor are they platform-independent Introduction Topes Demonstration Conclusion
New Approach: Topes • A tope = a platform-independent abstraction describing how to recognize and transformstrings in one category of data • Greek word for “place,” because each corresponds to a data category with a natural place in the problem domain Introduction Topes Demonstration Conclusion
A tope is a graph.Node = format, edge = transformation Notional representation for a CMU room number tope… Formal building name& room number Elliot Dunlap Smith Hall 225 Building abbreviation& room number EDSH 225 Colloquial building name& room number Smith 225 Introduction Topes Demonstration Conclusion
A tope is a conceptual abstraction.A tope implementation is code. • Each tope implementation has executable functions: • 1 isa:string[0,1] function per format, for recognizing instances of the format (a fuzzy set) • 0 or more trf:stringstring functions linking formats, for transforming values from one format to another • Validation function: (str) = max(isaf(str)) where f ranges over tope’s formats • Valid when (str) = 1 • Invalid when (str) = 0 • Questionable when 0 < (str) < 1 Introduction Topes Demonstration Conclusion
Today’s demonstration(using our latest version) • Create phone number tope • Infer boilerplate from examples • What are formats, parts, and constraints? • Label parts, add/fix constraints, test in tool • Validate spreadsheet data • Transform spreadsheet data • Reuse phone number tope • Create web application • Attach tope-based validator, configure, execute • Valid / invalid / questionable / valid-but-misformatted Introduction Topes Demonstration Conclusion
Contributions highlighted today • A model for data... • Short, human-readable strings • Ambiguous categories • Multiple formats • Implementation features: • Inference of customizable formats from examples • Soft constraints • Human-readable error messages • Validation code is reusable across platforms Introduction Topes Demonstration Conclusion
Other contributionsnot highlighted today • Validating with topes (quantitatively) improves… • Accuracy of validation • Reusability of validation code • Subsequent duplicate identification • Additional tool features: • Inter-tope reference (ie: “topes in topes”) • Whitelists • Various additional auto-transformation features • Overriding auto-transformation with JavaScript Introduction Topes Demonstration Conclusion
Validation and Tool Maturity • Expressiveness • Have implemented dozens of topes • Usability • Fast creation of accurate formats by users in study • Usefulness • Integrated w/ Excel, Visual Studio, and an XML library • Integrated by IBM & Univ. Nebraska into other tools Introduction Topes Demonstration Conclusion
Thank You… • To Jeff Magee, Betty Cheng, Barbara Ryder, Margaret Burnett, and others at ICSE 2007 for early feedback • To NSF for funding • To ICSE 2008 for this opportunity to present Introduction Topes Demonstration Conclusion
Available for download http://www.cs.cmu.edu/~cscaffid/software.shtml Or Google for "Topes SDK" Introduction Topes Demonstration Conclusion