540 likes | 736 Views
Doing More with LESS: Logic Embedded in SpreadSheets. Andre Valente (KSVentures and USC/ISI) David Van Brackle (ISX) Hans Chalupsky (KSVentures and USC/ISI) Gary Edwards (ISX). LESS: L ogic E mbedded in S pread S heets.
E N D
Doing More with LESS: Logic Embedded in SpreadSheets Andre Valente (KSVentures and USC/ISI) David Van Brackle (ISX) Hans Chalupsky (KSVentures and USC/ISI) Gary Edwards (ISX)
LESS: Logic Embedded in SpreadSheets • Logic Embedded in SpreadSheets (LESS) will provide the user with a system which combines the power of logic-based knowledge representation and reasoning with the familiar and easily-mastered user interface paradigm of a spreadsheet • LESS was developed under an SBIR contract from DARPA • Project Manager: David Gunning
Motivation • Spreadsheets are Widespread but Error Prone [Abraham & Erwig, 2003] • 55 Million users (= programmers) of spreadsheets in 2005 • Spreadsheets contain alarming rates of errors [Erwig and Burnett 2002] • Spreadsheets have their strengths… • Rapid model development • Interactive “What If” feedback • Scalable application of easily defined computational logic • … and weaknesses • Hard to express complex logical statements • Data structure hides relationships
Logic spreadsheets can • Improve quality (less errors) • Increase maintainability • Support the construction of more explicit and complex models • Be a practical platform for knowledge acquisition
Key Challenges • Basic Perspective: • A pragmatic, scruffy, engineering approach • Start with open mind – try things out, see what works • Usefulness – How to develop a useful tool, not just an academic exercise? • Answer: a real hybrid • Enrich the functionality of a spreadsheet while maintaining on its core interaction paradigm • Spreadsheet should not be a GUI for knowledge bases • Logic should not be just a function library in the spreadsheet • Usability – How to get the syntax and the human interaction right to make it usable • Answer: four tiers • Hide as much logic as needed • Allows different trade-offs
The LESS Model Tier 4: App Templates Tier 3: Full Integration of Spreadsheet and Logic System Tier 2: Tabular-oriented Knowledge Integrated into Spreadsheet GUI Tier 1: Logic functions available programmatically in formulas
PowerLoom functionality available to Spreadsheet functions as function calls Lifting Mappings – pushing info into the KB Query Mappings – extracting info from the KB Tier 1: Logic functions available programmatically in formulas ASSERT( “Age”, $A2, B2 ) IF( ASK( “Happy”, $A3 ), “Yes” )
Tier 2: Tabular-oriented Knowledge Integrated into Spreadsheet GUI • Instance Tables: A collection of instances of a given class will be represented as a special spreadsheet table • Creating a new Instance Table is equivalent to creating a Class in an Ontology • Rules are entered as text, much like Spreadsheet Functions
Tier 3: Full Integration of Spreadsheet and Logic System • Increased usability • Hide away details of logic formulations • Develop special GUIs integrated into the spreadsheet • Wizards • Special cell editors • TreeTable metaphors • Requires tighter integration of logic and spreadsheet data models
Tier 4: Application Templates • Templatize working LESS applications • Hide even more of logic to users • Better deployment, packaging mechanisms • Domain Paks • Templates captured by experts in a given field and widely distributed • Production tools (meta level) • Mechanisms for connecting to existing data (e.g. in DBMSs)
LESS Components • LESS integrates: • PowerLoom, a powerful logic KR&R system • Microsoft Excel, the most widely used spreadsheet in the world • LESS is implemented as an extension (add-in) to Microsoft Excel • Status: prototype implementation to be demonstrated here • LESS Excel add-in has 1.6Mb (includes PowerLoom libraries) • Implemented in C++ and Visual Basic
Component Interactions Excel Spreadsheet LESS – a mapping layer between Excel and PowerLoom Push Knowledge Into PowerLoom Pull Knowledge From PowerLoom Push Knowledge To Spreadsheet Invoke PowerLoom Functions Invoke Spreadsheet Functions PowerLoom
PowerLoom • PowerLoom is result of 20 years of research and practical use of KR&R systems at USC/ISI • Powerful and expressive • Unique “pragmatic” stance to tackle real world problems • Usability more important than theoretical "neatness" • Allow trade-offs between expressivity/completeness • A KR system for real-world applications • Used in HPKB, CoABS, RKF, EELD, etc.
PowerLoom Highlights • Representation in fully-expressive first-order predicate logic • KIF syntax with many extensions to standard FOL: • Type, set & cardinality relations (e.g., subset-of, range-cardinality) • Second-order definitions via holds • Classical negation and negation-by-failure, defaults • Frame-style definition language as syntactic sugar • Defconcept, defrelation, deffunction, definstance, defrule • Incremental monotonic and non-monotonic updates • Interleave definitions, assertions, retractions with retrieval and inference • Context mechanism separates name and assertion spaces with inheritance • Provides structuring mechanism, facilitates hypothetical reasoning • Several reasoning mechanisms • Reasoning specialists architecture for “plug-ins” (e.g., time reasoner) • RDBMS interface based on lifting axioms
PowerLoom Status and Distribution • Written in STELLA • Available in Lisp, C++ and Java • Both STELLA and PowerLoom are now Open Source • Recent development for PowerLoom • GNU or Mozilla licenses • Current release: PowerLoom 3.0.2.beta • Distributed as Lisp, C++ and Java source • More recently in STELLA source • ~600 downloads world-wide • ~400 subscribers to the mailing lists
A Key Insight: Relational vs. Positional Referencing (1) • Spreadsheets use positional referencing to refer to information • G5 = relative cell reference • $G$5 = absolute cell reference • A5:D5 = range of cells {A5 B5 C5 D5} • Positional Referencing is very intuitive in the beginning… • But it is highly error-prone • More complicated than it seems at a first glance • Spreadsheets try hard do update references when changes are made (e.g., a line is inserted), but errors are frequently introduced • Also, restricted to two dimensions
A Key Insight: Relational vs. Positional Referencing (2) • Logic uses relational referencing to refer to information • Meaning is conveyed by relations and assertions • Information is referenced by open variables as well as role in relations • Example: • Logic function (hours ?person ?month ?task) • Means the number of hours a ?person spends on a ?task in a ?month • N-ary relations and more powerful proposition specification languages add power to spreadsheets
A Key Insight: Relational vs. Positional Referencing (3) • Key insight of LESS is to add relational referencing to spreadsheets • Mixed model allies best of both worlds • Cells can still be used to refer to data • Cell Variables can be bound to contents of cells • Logic propositions express relationship between several information elements
LESS Features – Highlights • Explicit relationships • Multi-dimensional relations • Better, Explicit (Business) Rules • Powerful information querying • Platform for Knowledge Acquisition/Engineering • Platform for deploying Knowledge-Based Systems
About Our Examples • Three main examples: • Pilot training spreadsheet • Budget spreadsheet • Grading spreadsheet • All examples are real-life spreadsheets • Pilot training example based on a spreadsheet used by ANG 149th Tactical Fighter Squadron • Budget spreadsheet was used by a small business for writing cost proposals for government contracts • Grading spreadsheet was used to grade an actual course as well as an exercise for a class in spreadsheets • Sometimes simplified for presentation purposes • Syntax in these slides is simplified (we are still adding syntactic sugar)
LESS Features – Highlights • Explicit relationships • Multi-dimensional relations • Better, Explicit (Business) Rules • Powerful information querying • Platform for Knowledge Acquisition/Engineering • Platform for deploying Knowledge-Based Systems
Positional Referencing Looks Easier Than It Is (1) • For example, B6 contains the formula =SUM(B3:B5) • Simple, easy…
Positional Referencing Looks Easier Than It Is (2) • Even such a simple sheet generates errors when a new employee (line) is added! • Totals stayed the same but should have been updated
Positional Referencing Looks Easier Than It Is (3) • What is the problem? • In positional referencing SUM(B3:B5) really means “sum the three cells above me” • This clearly is not what is intended • Intention of the user is “sum the Jan-05 hours for all persons in Task 1” • Relational referencing allows a user to say just that!
Relational Referencing in LESS Makes Relationship Explicit (1) • With LESS, we want to express the relationships explicitly • (hours ?person ?month ?task) • Means the number of hours a ?person spends on a ?task in a ?month • In order to add relational information to a spreadsheet, we need to add a lifting formula • One way is to add a lifting formula on top of each column: • =ASSERTMULTI(“hours”, A3:A5, B2, $A$1) • This expands into and performs a series of assertions, e.g.: • (hours A3 B2 A1) = (hours “Fred” Jan-05 “Task 1”) = 12 • (hours A4 B2 A1) = (hours “Wilma” Jan-05 “Task 1”) = 4
Relational Referencing in LESS Makes Relationship Explicit (2) • Once the information is lifted into PowerLoom, we can easily express a cell as “sum the Jan-05 hours in Task 1 for all persons” • B6 contains the formula: =SUM (RETRIEVEALL (“hours ?person Jan-05 “Task 1”)”)) • Notice mix of logic and spreadsheet functions
LESS Features – Highlights • Explicit relationships • Multi-dimensional relations • Better, Explicit (Business) Rules • Powerful information querying • Platform for Knowledge Acquisition/Engineering • Platform for deploying Knowledge-Based Systems
Multi-dimensional relations (1) • Problem: positional referencing only works when information has at most two dimensions (3-arity relations) • Intuitively easy to see: user cannot make a range with three dimensions
Multi-dimensional relations (2) • Example: calculate totals across tables • Formula looks much less intuitive, e.g. B24=B3+B10+B17 • Can’t use ranges • Extremely hard to maintain and error prone! • E.g., if a new task is added or there are many tasks • What you really want to say is “sum the hours in Jan-05 for Fred in all tasks”
What happened? • Task table can be seen as a model with two dimensions – person and monthly hours • BUT: Overall information model reflects indexing in three dimensions – person, month and task • When information needs to be aggregated across more than two dimensions, the spreadsheet model suffers • Pivot tables can overcome some limitations, but only for some cases
LESS Supports Multi-Dimensional Relations • Assuming the same lifting described earlier, we can easily express “sum the hours in Jan-05 for Fred in all tasks” as: =SUM (RETRIEVEALL ("(hours Fred Jan-05 ?task)”)) • This can also easily be turned in to an Excel-like template formula by mixing cell variables: =SUM (RETRIEVEALL ("(hours A24 B$23 ?task)”))
LESS Features – Highlights • Explicit relationships • Multi-dimensional relations • Better, Explicit (Business) Rules • Powerful information querying • Platform for Knowledge Acquisition/Engineering • Platform for deploying Knowledge-Based Systems
LESS Provides Better, Explicit Rules • Spreadsheets encode business rules in formulas • Several problems: • Rules are hidden from view – harder to audit, opaque • Rules have to be repeated everywhere they are used – higher chance for errors, difficult to maintain • Another problem is that spreadsheet formulas are very poor in expressing rules that are not strictly mathematical formulas • No good mechanism for complex conditions • LESS allows a user to model explicit (business) rules • Better methodology (trend towards explicit business rules) • Easier to debug and maintain • Improves visibility, communication
Better, Explicit Rules - Example (1) • In the example below, the formula to calculate the grade letter (column H) requires nested conditionals: • IF($H4<70,"D", IF($H4<75,"C-", IF($H4<78,"C", IF($H4<80,"C+", IF($H4<85,"B-", IF($H4<88,"B", IF($H4<90,"B+", IF($H4<95,"A-","A")))))))) • Opaque, hard to read, easy to make mistakes • “Rule” is repeated once per line – error prone, hard to maintain
Better, Explicit Rules – Example (2) • In LESS we can define rules elsewhere and simply apply them • In the same example, the formula to calculate the grade letter (column H) is a simple query: RETRIEVE("1 (letter-grade ", A3, " ?x)" ) ) • Notice how the rules are defined once, applied several times
Better, Explicit Rules – Example (3) • In LESS rules can be defined separately and explicitly, and reused everywhere:
Better, Explicit Rules – Example (4) • Rules can also be parameterized on Excel data – a nice fit with Excel modeling style:
Better, Explicit Rules – Example (5) • For ultimate elegance, we can use templates (Phase II):
LESS Features – Highlights • Explicit relationships • Multi-dimensional relations • Better, Explicit (Business) Rules • Powerful information querying • Platform for Knowledge Acquisition/Engineering • Platform for deploying Knowledge-Based Systems
Powerful Information Querying • Spreadsheets have limitations on how to summarize information • Particularly if the information is symbolic • Basic problem is that it is hard to select information from other parts of the spreadsheet • Worse if information is in multiple tables and/or has more than three dimensions • Special wizards such as Excel pivot tables help overcome some of these deficiencies • But they still have restricted expressivity
LESS Demo – Squadron Training Examples Also – Tiers 2, 3
LESS Features – Highlights • Explicit relationships • Multi-dimensional relations • Better, Explicit (Business) Rules • Powerful information querying • Platform for Knowledge Acquisition/Engineering • Platform for deploying Knowledge-Based Systems
New Possibilities Opened with LESS • One way to look at logic spreadsheets is for logic to add spice to spreadsheets • Delivers incremental improvements • (Only) user is spreadsheet user • We found that logic spreadsheets can be good for knowledge engineers as well • Two types of applications: • Template-based knowledge acquisition • Reasoning
Template-based knowledge acquisition (1) • A common problem in knowledge acquisition is to acquire knowledge about many similar objects • KA approaches are hard to understand for SMEs • Require understanding of modeling decisions • For instance, not reasonable to ask an SMEs to model (example from DARPA ARPI and JFACC Programs): (defconcept F-15 :IS-PRIMITIVE Fighter-Aircraft :IMPLIES (:AND (:FILLED-BY NICKNAME "Eagle") (:FILLED-BY MADE-BY MCDONNELL-DOUGLAS) (:EXACTLY 2 ENGINES) (:ALL GUN M61A1) (:FILLED-BY IN-SERVICE 1972) (:FILLED-BY PRIMARY-FUEL JP-4) (:FILLED-BY ALTERNATE-FUEL JP-5 JP-8 JET-A JET-A-1 JET-B)) :DEFAULTS (:AND (:FILLED-BY CEILING 65000ft) (:FILLED-BY COST 15000000) (:FILLED-BY CREW-SIZE 1)))
Template-based knowledge acquisition (2) • An approach to deal with this problem is to have “knowledge templates” • Templates abstract the way information is used in the logic formalism • Middleware code (e.g. a Lisp macro) “routs” the information in the template to its place in the knowledge base (e.g., slot filler, type, default value) • Example of template implemented as Lisp macro: (def-aircraft-c F15 :types fighter-attack-aircraft :nickname "Eagle" :made-by McDonnell-Douglas :engine-count 2 :primary-fuel JP-4 :alternate-fuel (JP-5 JP-8 Jet-A Jet-A-1 Jet-B)) • Ideal division of labor: • SME enters information easily • Knowledge engineer designs templates to map into a KB
Template-based knowledge acquisition (3) • Logic Spreadsheets are great for implementing knowledge templates • Table format is natural for most SMEs • Spreadsheet is widely used (info may already be in a spreadsheet!) • Information also available for other uses • Example:
LESS Features – Highlights • Explicit relationships • Multi-dimensional relations • Better, Explicit (Business) Rules • Powerful information querying • Platform for Knowledge Acquisition/Engineering • Platform for deploying Knowledge-Based Systems