1 / 18

Goals

Creating a Corpus for A Conversational Assistant for Everyday Tasks Henry Kautz , Young Song , Ian Pereira, Mary Swift, Walter Lasecki, Jeff Bigham, James Allen University of Rochester. Goals. Fine-grained activity recognition combining speech and RGBD vision

samson
Download Presentation

Goals

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Creating a Corpus forA Conversational Assistant for Everyday TasksHenry Kautz, Young Song, Ian Pereira, Mary Swift, Walter Lasecki, Jeff Bigham, James AllenUniversity of Rochester

  2. Goals • Fine-grained activity recognition combining speech and RGBD vision • Learning and recognizing multi-step activities from (one-shot) instruction • Learning names and properties of objects from instruction • Tracking and assistance using task model

  3. overhead mics power meter lapel mic video kinect open/close sensors RFID sensors

  4. Language Logical Form “I’m going to make a cup of tea.”

  5. Extracted Events “I put it on the stove.” :event ont::put :agent user :theme v123 :start 0 :end 32 :utt 2 :speechtime/eventtime reln: overlap

  6. Domains • Making tea - 12 subjects x 3 episodes • Making sandwiches • Building things with blocks • Coarse-grained home activities • Snack bar surveillance

  7. Labeling Corpus • Need to label data for • Supervised learning methods • Evaluating supervised or unsupervised methods • “Gold standard” • Define event ontology • Hand label • Review / correct by second investigator • 1 hour per 2 minutes • Alternative?

  8. Crowd AR • Idea • Try to recognize activities using current model • When confidence is low, ask human workers to label video segment • Mediate response • Update model with new labels

  9. Worker Interface • Workers watch a live video stream of an activity and enter open-ended text labels into the bottom text field • They can see the responses of other workers and the learningmodel (HMM) on to the right of the video, and agree with them by clicking on them.

  10. Mediator • An example of the graph created by the input mediator • Green nodes represent sufficient agreement between multiple workers (here N = 2). • The final sequence matches the baseline despite incorrect (over-specific) submissions by 2 out of the 3 workers, and a spelling error by one worker on “walk”the word ‘walk’.

  11. Interactive Recognition and Labeling Experiments • Domain: coarse-grained activities • Model: HMM

  12. Privacy

  13. Monitoring Multi-Agent Scenarios • Surveillance of department honor snack bar • 85% correct on 11 trials

  14. Parameterized & Complex Activities • Average number of objects and actions correctly labeled by worker groups of different sizes over two different activity sequences. • As the group size increases, more objects and actions are labeled.

More Related