130 likes | 146 Views
Paul Groth and Yolanda Gil Information Sciences Institute University of Southern California. SCAFFOLDING INSTRUCTIONS TO LEARN PROCEDURES FROM USERS. Learning Procedures Naturally. Humans learn procedures using a variety mechanisms Observation, practice, reading textbooks
E N D
Paul Groth and Yolanda Gil Information Sciences Institute University of Southern California SCAFFOLDING INSTRUCTIONS TO LEARN PROCEDURES FROM USERS
Learning Procedures Naturally • Humans learn procedures using a variety mechanisms • Observation, practice, reading textbooks • Human tutorial instruction • Broad descriptions of actions and explanations of their dependencies • The computer is told what to do by the instructor. • Goal: learn procedures from instruction that is natural to provide
What is Instruction by Telling • General statements • Not refer to a specific state • Descriptive statements • About types, functions, processes “To dial a number, lift the receiver and punch the number” “Place a pot with water on the stove. Turn the stove on and wait until the water bubbles” “A good hovering area is behind a tall building that is more than 200 ft away” “A vehicle is parked if it is stopped on the side of the road or if it is stopped in a parking lot for more than 3 minutes”
Why is Instruction by Telling Important? • [Scaffidi et al 06]: by 2012, 90M end user programmers in the US alone • 13M would describe themselves as programmers • 55M will use spreadsheets and databases • [Adams 08]: we have gone from dozens of markets of millions of users to millions of markets of dozens of users • The “long tail of programming” [Anderson 08] • Today most successful end user applications focus on data manipulation through spreadsheets and web forms • We need approaches to allow end users to specify procedures to process data or to control a physical environment • With examples • By telling: a natural method for humans, needed if procedures are complex and hard to generalize from examples
Shortcomings of Human Instruction [Gil 09] • Organization • Omissions • Structure • Errors • Student’s preparation • Student’s ability • Teacher’s skills “[…] Procedures are complex relational structures and the mapping between these structures and a linear sequence of propositions expressed in discourse is not easy to define.” – [Donin et al 02]
TellMe Learning Procedures by Being Told • Developed four-stage process: • Ingestion: create initial procedure stub from given instruction • Elaboration: map terms to existing knowledge, infer missing information using heuristics, create hypotheses of procedures • Elimination: rule out hypotheses through symbolic execution • Selection: select one procedure hypothesis using heuristics that maximize consistency
An Example Which player? Closest could have meant closest sideline or it could have meant closest player Instruction is ambiguous & incomplete What is “closest”? Repeat until when? <j.0:Procedure rdf:about="#SetupKlondikeSolitaire"> <j.0:hasSteps rdf:parseType="Collection"> <j.0:Procedure rdf:about="#Deal"/> <j.0:Procedure rdf:about="#Layout"/> <j.0:Loop> <j.0:until rdf:resource="#Unknown> <j.0:repeat rdf:parseType="Collection"> <j.0:Procedure rdf:about="#Decrement"/> <j.0:Procedure rdf:about="#Deal"/> <j.0:Procedure rdf:about="#Layout"/> </j.0:repeat> </j.0:Loop> </j.0:hasSteps> <j.0:hasInput rdf:resource="#deck"/> <j.0:hasResult rdf:resource="#solitaireGameSetup"/> </j.0:Procedure> 2. start SetupKlondikeSolitaire hasId 8887 3. resultIsname=solitaireGameSetup isatype=GameSetup 4. initSetupname=deck isatype=CardDeck 5. doThisname=Deal basedOn deck, 7 expectname=hand 6. name=hand isatype=Hand 7. doThisname=Layout basedOn hand 8. name=numOfCards isatype=Integer 9. name=numOfCards value=7 10. repeatdoThisname=Decrement basedOn numOfCards expect numOfCards Create initial interpretations based on prior knowledge, annotate gaps 1 <j.0:Procedure rdf:about="#SetupKlondikeSolitaire"> <j.0:hasSteps rdf:parseType="Collection"> <j.0:Procedure rdf:about="#Deal"/> <j.0:Procedure rdf:about="#Layout"/> <j.0:Loop> <j.0:until rdf:resource="#Unknown> <j.0:repeat rdf:parseType="Collection"> <j.0:Procedure rdf:about="#Decrement"/> <j.0:Procedure rdf:about="#Deal"/> <j.0:Procedure rdf:about="#Layout"/> </j.0:repeat> </j.0:Loop> </j.0:hasSteps> <j.0:hasInput rdf:resource="#deck"/> <j.0:hasResult rdf:resource="#solitaireGameSetup"/> </j.0:Procedure> Repeat until left sideline is reached or right sideline or front line Elaborate using heuristics for filling gaps 2 Repeat until teammate or opponent has ball <j.0:Procedure rdf:about="#SetupKlondikeSolitaire"> <j.0:hasSteps rdf:parseType="Collection"> <j.0:Procedure rdf:about="#Deal"/> <j.0:Procedure rdf:about="#Layout"/> <j.0:Loop> <j.0:until rdf:resource="#Unknown> <j.0:repeat rdf:parseType="Collection"> <j.0:Procedure rdf:about="#Decrement"/> <j.0:Procedure rdf:about="#Deal"/> < <j.0:Procedure rdf:about="#Deal"/ j.0:Procedure rdf:about="#Layout"/> </j.0:repeat> </j.0:Loop> </j.0:hasSteps> <j.0:hasInput rdf:resource="#deck"/> <j.0:hasResult rdf:resource="#solitaireGameSetup"/> </j.0:Procedure> <j.0:Procedure rdf:about="#SetupKlondikeSolitaire"> <j.0:hasSteps rdf:parseType="Collection"> <j.0:Procedure rdf:about="#Deal"/> <j.0:Procedure rdf:about="#Layout"/> <j.0:Loop> <j.0:until rdf:resource="#Unknown> <j.0:repeat rdf:parseType="Collection"> <j.0:Procedure rdf:about="#Decrement"/> <j.0:Procedure rdf:about="#Deal"/> < <j.0:Procedure rdf:about="#Deal"/ j.0:Procedure rdf:about="#Layout"/> </j.0:repeat> </j.0:Loop> </j.0:hasSteps> <j.0:hasInput rdf:resource="#deck"/> <j.0:hasResult rdf:resource="#solitaireGameSetup"/> </j.0:Procedure> <j.0:Procedure rdf:about="#SetupKlondikeSolitaire"> <j.0:hasSteps rdf:parseType="Collection"> <j.0:Procedure rdf:about="#Deal"/> <j.0:Procedure rdf:about="#Layout"/> <j.0:Loop> <j.0:until rdf:resource="#Unknown> <j.0:repeat rdf:parseType="Collection"> <j.0:Procedure rdf:about="#Decrement"/> <j.0:Procedure rdf:about="#Deal"/> < <j.0:Procedure rdf:about="#Deal"/ j.0:Procedure rdf:about="#Layout"/> </j.0:repeat> </j.0:Loop> </j.0:hasSteps> <j.0:hasInput rdf:resource="#deck"/> <j.0:hasResult rdf:resource="#solitaireGameSetup"/> </j.0:Procedure> <j.0:Procedure rdf:about="#SetupKlondikeSolitaire"> <j.0:hasSteps rdf:parseType="Collection"> <j.0:Procedure rdf:about="#Deal"/> <j.0:Procedure rdf:about="#Layout"/> <j.0:Loop> <j.0:until rdf:resource="#Unknown> <j.0:repeat rdf:parseType="Collection"> <j.0:Procedure rdf:about="#Decrement"/> <j.0:Procedure rdf:about="#Deal"/> <j.0:Procedure rdf:about="#Layout"/> </j.0:repeat> </j.0:Loop> </j.0:hasSteps> <j.0:hasInput rdf:resource="#deck"/> <j.0:hasResult rdf:resource="#solitaireGameSetup"/> </j.0:Procedure> <j.0:Procedure rdf:about="#SetupKlondikeSolitaire"> <j.0:hasSteps rdf:parseType="Collection"> <j.0:Procedure rdf:about="#Deal"/> <j.0:Procedure rdf:about="#Layout"/> <j.0:Loop> <j.0:until rdf:resource="#Unknown> <j.0:repeat rdf:parseType="Collection"> <j.0:Procedure rdf:about="#Decrement"/> <j.0:Procedure rdf:about="#Deal"/> <j.0:Procedure rdf:about="#Layout"/> </j.0:repeat> </j.0:Loop> </j.0:hasSteps> <j.0:hasInput rdf:resource="#deck"/> <j.0:hasResult rdf:resource="#solitaireGameSetup"/> </j.0:Procedure> X X X Eliminate through symbolic execution and reasoning 3 Select based on heuristics that maximize consistency 4
Example Heuristics • Ingestion • If a variable is assigned a constant in the instruction, then find a consistent basic type for it. • Elaboration • If the input of a component (i.e. subtask) is type compatible with the result of a preceding component, then that result could be connected to the input. • If two variables share any typing information they could be unified. • Elimination • Hypotheses with matching symbolic execution traces can considered to be the same. • Selection • Pick the simplest hypothesis (with least components).
Instructions that TellMe Can Process // Stop and move back to your previous position (e.g. cut back). 11: doThis name=MoveTowards 12: basedOn originalPosition expect=currentPosition // If you are not open, do this again. 13: until 14: name=Open basedOn currentPosition // Once your open, find the ball and face it. 15: doThis name=FindTheBall expect=ballLocation 16: name=ballLocation isa type=Position 17: doThis name=Face basedOn ballLocation 18: end 1: begin lesson 2: start GetOpen hasId 8888 3: repeat // Find your closest opponent. 4: doThis name=GetCurrentPosition expect originalPosition 5: name=originalPosition isa type=Position 6: doThis name=FindClosestOpponent basedOn=originalPosition expect=opponentLocation // Dash away from them 7: name=opponentLocation isa type=Position 8: doThis name=FaceAwayFrom basedOn opponentLocation 9: doThis name=Dash expect=currentPosition 10: name=currentPosition isa type=Position
Instructions that TellMe Can Process // Stop and move back to your previous position (e.g. cut back). 11: doThis name=MoveTowards 12: basedOnoriginalPosition expect=currentPosition // If you are not open, do this again. 13: until 14: name=Open basedOncurrentPosition // Once your open, find the ball and face it. 15: doThis name=FindTheBall expect=ballLocation 16: name=ballLocationisa type=Position 17: doThis name=Face basedOnballLocation 18: end 1: begin lesson 2: start GetOpenhasId 8888 3: repeat // Find your closest opponent. 4: doThis name=GetCurrentPosition expect originalPosition 5: name=originalPositionisa type=Position 6: doThis name=FindClosestOpponent basedOn=originalPosition expect=opponentLocation // Dash away from them 7: name=opponentLocationisa type=Position 8: doThis name=FaceAwayFrom basedOnopponentLocation 9: doThis name=Dash expect=currentPosition 10: name=currentPositionisa type=Position
Applying the Framework • “A Scientific Workflow Construction Command Line” [Groth & Gil IUI-09] • Real natural language descriptions of procedures: • Protocols in GenePattern [Reich et al 08] • Workflows in MyExperiment [DeRoure et al 09] • Example used: “This workflow performs data cleansing on genes, clusters the results, and then displays a heatmap.”
Conclusion • TellMe provides a framework for addressing learning from instruction given by humans • The approach can be applied to different domains • Future work includes: • More and better heuristics • Dealing with more sophisticated language constructs • Towards natural language input