180 likes | 295 Views
Creating a Corpus for A Conversational Assistant for Everyday Tasks Henry Kautz , Young Song , Ian Pereira, Mary Swift, Walter Lasecki, Jeff Bigham, James Allen University of Rochester. Goals. Fine-grained activity recognition combining speech and RGBD vision
E N D
Creating a Corpus forA Conversational Assistant for Everyday TasksHenry Kautz, Young Song, Ian Pereira, Mary Swift, Walter Lasecki, Jeff Bigham, James AllenUniversity of Rochester
Goals • Fine-grained activity recognition combining speech and RGBD vision • Learning and recognizing multi-step activities from (one-shot) instruction • Learning names and properties of objects from instruction • Tracking and assistance using task model
overhead mics power meter lapel mic video kinect open/close sensors RFID sensors
Language Logical Form “I’m going to make a cup of tea.”
Extracted Events “I put it on the stove.” :event ont::put :agent user :theme v123 :start 0 :end 32 :utt 2 :speechtime/eventtime reln: overlap
Domains • Making tea - 12 subjects x 3 episodes • Making sandwiches • Building things with blocks • Coarse-grained home activities • Snack bar surveillance
Labeling Corpus • Need to label data for • Supervised learning methods • Evaluating supervised or unsupervised methods • “Gold standard” • Define event ontology • Hand label • Review / correct by second investigator • 1 hour per 2 minutes • Alternative?
Crowd AR • Idea • Try to recognize activities using current model • When confidence is low, ask human workers to label video segment • Mediate response • Update model with new labels
Worker Interface • Workers watch a live video stream of an activity and enter open-ended text labels into the bottom text field • They can see the responses of other workers and the learningmodel (HMM) on to the right of the video, and agree with them by clicking on them.
Mediator • An example of the graph created by the input mediator • Green nodes represent sufficient agreement between multiple workers (here N = 2). • The final sequence matches the baseline despite incorrect (over-specific) submissions by 2 out of the 3 workers, and a spelling error by one worker on “walk”the word ‘walk’.
Interactive Recognition and Labeling Experiments • Domain: coarse-grained activities • Model: HMM
Monitoring Multi-Agent Scenarios • Surveillance of department honor snack bar • 85% correct on 11 trials
Parameterized & Complex Activities • Average number of objects and actions correctly labeled by worker groups of different sizes over two different activity sequences. • As the group size increases, more objects and actions are labeled.