480 likes | 605 Views
An Oxygenated Presentation Manager. Larry Rudolph Oxygen Workshop, January, 2002. Goals & Overview. Integrate Many Oxygen Technologies Application Driven Use an application that we understand Personally use often Would help if were more human-centric Portable (as opposed to E-21)
E N D
An Oxygenated Presentation Manager Larry Rudolph Oxygen Workshop, January, 2002 Larry Rudolph & Shalini Agarwal
Goals & Overview • Integrate Many Oxygen Technologies • Application Driven • Use an application that we understand • Personally use often • Would help if were more human-centric • Portable (as opposed to E-21) • Develop Architectural Infrastructure • Exposes new requirements • Critique of Presentation Manager • What is wrong with it • What needs improvement Larry Rudolph & Shalini Agarwal
Application Scenario Larry Rudolph & Shalini Agarwal
An Oxygen Application Components • Input • Vision • Speech • Touch • Output • Projector • Handheld • Archive • Processing • Changing configuration • Equipment • Today, it is too hard • Linux laptop; windows laptop; camera; microphone; network; projector; power blocks • Tomorrow, much easier • a couple of H21’s Larry Rudolph & Shalini Agarwal
Camera watching laser point on screen • Camera Challenges • Inexpensive ones have wrong focal length • Alignment issues • Use edge of screen, display pattern, figure out from what is known to be visible • We ended up displaying a pattern of concentric circles • Relative size of laser point depends on distance • Beyond ten feet, had to use only certain types of lasers • Could slow-down camera and let pixels saturate (too complicated) Larry Rudolph & Shalini Agarwal
Camera watching laser point on screen (cont) • Camera Interface • Click at point (x,y) • Hold laser at same location for 5 seconds • Select horizontal line ( (x1,y1) , (x1,y2) ) • Sweep laser back and forth, line is diameter of ellipse • Select object centered at point (x,y) • Sweep laser in circle, point is center of circle • Previous or Next • Click in left (right) 1/8 of screen Larry Rudolph & Shalini Agarwal
Microphone listening to speaker • Microphone • Many technologies; • Lapel-mic; mic array; room microphone • Current approach: ipaq • Continuous recognition • Push to speak • Audio server on ipaq • Detects start and stop • Best results when human pushes to start and releases to stop • Audio wave file sent to Galaxy speech system • Galaxy output actions via CGI-script • A nice unifying mechanism • One more complicated component Larry Rudolph & Shalini Agarwal
Speaker controlling presentation via ipaq • Ipaq output to CGI-script Server • Same actions as from speech server • Action are • Next slide, Previous slide, Goto slide #n, Goto slide named <xxx> • Next item, Previous item, Goto item #n, Goto item named <xxx> • Next animations, previous animation, goto animation #n • Start presentation <name>, End presentation, Pause presentation • Initialize Camera, test microphone • Handheld (Ipaq) display • GUI generated from speechbuilder grammar • List of slides, items per slides • Currently use ad-hoc solution where power-point sends lists to ipaq. Need more automatic solution Larry Rudolph & Shalini Agarwal
Output to projector, handheld, archive • Unlimited number of video / audio output producers • E.g. powerpoint just one producer of output • At any time, each output device has an associated producer • This producer can receive input from several producers • Handheld has proxy • To reduce bandwidth to ipaq • Current slide, list of slides, list of commands • Archive • Each slide shown, audio (from a different microphone) sent to archive • Currently just gif of current slide Larry Rudolph & Shalini Agarwal
Processing – controlling session • Do not let powerpoint control the world • Slide viewer; movie player; program execution; browser; etc • Want to mix all types of applications • Presenter has control of the output • Eg: Switch output producer from powerpoint to media player • Remove interrupting technologies • Dynamically disconnect any input / output source • All done via core language • Or some other glue language, e.g. meta-glue • Which does all the other infrastructure issues Larry Rudolph & Shalini Agarwal
Multi-Modal Input Shalini Agarwal Oxygen Conference January 8th, 2002
Initial Experience With Presentation Manager One Single Monolithic Context Command within slide, between slides, between applications Problem Too many false positives Preliminary Solution Slide tracking • e.g. recognize “Next Slide” command only after at least 60% of words on slide have been said • e.g. recognize “Show Demo” only after slide 17 Still lots of problems • Many slide styles hard to track (e.g. figures not words on slide) • Tracking for within slide different than for between slides Larry Rudolph & Shalini Agarwal
A Better Solution: Multiple Contexts Very Active Research Area Intelligent-room project; Galaxy; Others Three layers, each having its own context • Slide (Next Item, Next Animation) • Presentation (Next Slide, Goto Conclusion, Goto Example) • Session (Start Presentation, Switch to Browser, Show Questions) Challenges • Each context requires its own speech recognition system • Multicasting sound wave to each system • Selecting the best result Larry Rudolph & Shalini Agarwal
Language Generation Speech Synthesis Dialogue Management Hub Audio Database Server Speech Recog. Discourse Resolution Language Processing Extending the Galaxy System • Start with context for speech and then extend • Note, our goals are similar but not identical to those of the Spoken Language Group • We are not dialog-based • Exploit their work • Follow Galaxy • Recognizer scores different guesses at words • Language Processing Unit uses input grammar to select best input sentence • Scott Cyphers gave us the nbest interface Larry Rudolph & Shalini Agarwal
Recognizer chooses 10 best guesses at word matches (for this context) Language Processor picks best sentence from recognizer based on input grammar Larry Rudolph & Shalini Agarwal
Language Processor Recognizer go to slide nine go to slide nine Presentation Layer Sound Input go to twenty nine go to nine System Structure Larry Rudolph & Shalini Agarwal
Language Processor Recognizer go to slide nine go to slide nine Presentation Layer Sound Input go to twenty nine go to nine System Structure Language Processor Recognizer next item next item next movie Slide Layer previous item Selector start presentation Language Processor Recognizer Session Layer end presentation start presentation start presentation start explorer Larry Rudolph & Shalini Agarwal
Language Processor Recognizer next item next item next movie Slide Layer previous item Language Processor Recognizer Selector go to slide nine go to slide nine Presentation Layer Sound Input go to twenty nine go to nine start presentation Language Processor Recognizer Session Layer end presentation start presentation start presentation start explorer System Structure Larry Rudolph & Shalini Agarwal
Add Recognizer for T9 Language Processor Recognizer next item Slide Layer T9 Input Language Processor Recognizer Selector go to slide nine Presentation Layer Sound Input start presentation Language Processor Recognizer Session Layer start presentation Larry Rudolph & Shalini Agarwal
Add Recognizer for Graffiti Language Processor next item Slide Layer Recognizer T9 Input Language Processor Selector go to slide nine Presentation Layer Sound Input Graffiti Input Recognizer start presentation Language Processor Session Layer start presentation Recognizer Larry Rudolph & Shalini Agarwal
Other Input Modes • T9 (telephone keypad) • To input a, b, or c press “2”; • Current cell phones have dictionary to select correct word • Lots of false positives (very annoying) • Remember my introduction? • Using an application-dependent grammar would reduce errors • Pen-based character input • Use strokes to input characters • Current palm pilot only recognizes “Graffiti” alphabet • Lots of false positives (very annoying) • Using an application-dependent grammar would reduce errors Larry Rudolph & Shalini Agarwal
Replacing the Recognizers • Build recognizers for T9 and Graffiti • Use Galaxy system to process results from new recognizers Language Generation Speech Synthesis Dialogue Management Hub Audio Database Server T9 Recog. Speech Recog. Discourse Resolution Language Processing Graffiti Recog. Larry Rudolph & Shalini Agarwal
Conclusion • Each application defines an input grammar • This grammar can be used to • Ensure that each application gets valid input • It might not be what the user wanted, but the application will understand it • Reduce false-positives • Identify the input suitable for associated application • Choose the application with the highest score • If tie, must do something else (future research) • Enable T9, Graffiti, Speech, other input modes Larry Rudolph & Shalini Agarwal
Laser Pointer Great for drawing attention to content Audience is primary consumer Secondary use to control presentation But it is not a mouse Semantics are tied to slide context Differs from Intelligent-room use Small number of identified gestures Gestures easily punctuated Low computational overhead Soon will be handled with a H21 Vision / Gesture Recognition Larry Rudolph & Shalini Agarwal
Laser Pointer Great for drawing attention to content Cheap technology but mostly distracting Too shaky, imprecise But it is not a mouse More awkward to use than mouse Another gadget to hold in the hand, button to identify, batteries to maintain Small number of identified gestures There are better ways of drawing attention to slide content I rarely use it and don’t like it when others do Low computational overhead Dumb vs Intelligent Device Discussion Critique of Vision / Gesture Recognition Larry Rudolph & Shalini Agarwal
Speech Recognition • Initially seems like great idea • Speaker is already speaking, so can use it to control presentation • Want passive, intelligent listener • Not a dialog • No “prompt” :: alienating distraction • Want no mistakes • For dialog, better to guess than ignore • For us, high cost for incorrect guess • Most words are not relevant to speech system • More trouble than it is worth • But may be good for real-time search of content Larry Rudolph & Shalini Agarwal
More useful aspect – Output modalities • Presenter has put the time and effort into the production • Simplier is better • Audience has harder task • Understand material being presented • Record thoughts, impressions, connections • Filter for later review • Process in real-time • Keep-up with presentation • Do all this with minimal distractions • Output modalities • Content for live audience • Content for speaker (superset of audience) • Content for retrieval • Correlate notes with content Larry Rudolph & Shalini Agarwal
Record and correlate notes with presentation Larry Rudolph & Shalini Agarwal
CORE: Communication Oriented Routing Environment (Oxygen Research Group)
Assumptions • Actuators / Sensors (I/O) in the environment • Many are shared by apps & users • Many are flaky / faulty • “User” does not know much about them • Environment, application, users desires change over time Larry Rudolph & Shalini Agarwal
An Oxygen Application • Interconnected Collection of Stuff • Who specifies the stuff? • I don’t know, but its mostly virtual stuff • Many layers of abstraction • “Don’t ask, its turtles all the way down” • Two main layers of programming • Professionals • Users, e.g. grandmother Larry Rudolph & Shalini Agarwal
Communications-Oriented Programs • Connecting the (virtual) stuff done by user • Home stereo / theater analogy • Plug Stuff together; unplug it if doesn’t work • Don’t like it, unplug it • Device drivers, services, clients, don’t know to whom or to what they connect • In client/server model, • server knows a lot about the client, • the client knows even more about the server • Extend Unix Pipes Larry Rudolph & Shalini Agarwal
Physical Devices Programs (Processes) App Larry Bear’s CORE App CORE CORE Other COREs Larry Bear Larry Rudolph & Shalini Agarwal
Message Flow • Messages flow between nodes & core • Core is both language and router • Within Core Router, some messages • are interpreted and may trigger actions • other messages get routed to other nodes • Request-Reply message strategy • Even number of messages • No reply within time period, means error Larry Rudolph & Shalini Agarwal
CORE Language Elements • Four elements • Nodes, • Links, • Messages, • Rules • Features • Interpreted Language • Statement is a message & reply • Each element has an inverse Larry Rudolph & Shalini Agarwal
Presentation Speech Slide Speech Command Speech Nodehandler = (nickname, specifier) Nodes – Specify via INS Cam = [device=web-cam; location=518;…] PTRvision = [device=process; OS=Linux;File=Laser Vision, ..] CORE Laser Vision Larry Rudolph & Shalini Agarwal
Node Statement Handler • When ‘node’ message arrives • Verified for correctness (statements allowed) • Routed to Node Manager (just another node) • Node Manager • INS lookup, verifies if allowed, creates if needed • Creates core thread to manage communication with node • Bookkeeping & reply message with handle/error Larry Rudolph & Shalini Agarwal
Links Lcamera,vision = (Cam,PTRvision) Presentation Speech Slide Speech Command Speech CORE Laser Vision Larry Rudolph & Shalini Agarwal
Link Statement Handler • Message routed to ‘link’ manager • Two queries to node mng for thread cntl • Message to thread controller of source node • Specifying destination thread controller • Message to thread controller of dest node • Specifying source thread controller • Bookkeeping & reply message handler/error Larry Rudolph & Shalini Agarwal
Messages Messages flow over the links Next Slide! Presentation Speech Slide Speech Command Speech CORE Laser Vision Larry Rudolph & Shalini Agarwal
Message Handling • Messages can be encrypted • Core statement messages have fixed format • Everything else is data message • Each node thread has two unbounded buffers • Core to node & Node to core • Logging, rollback, fault-tolerance Larry Rudolph & Shalini Agarwal
Rules RULES: (trigger,action) ( MESSQuestion , Lslide,lcd-- & Lslide,qlcd ) Presentation Speech Slide Speech Questions Command Speech CORE Questions Questions Laser Vision Larry Rudolph & Shalini Agarwal
Rule Statement Handler • ( trigger , consequence ) • Both are “event sets” • Eight basic events: +Node, -Node, +Link, -Link +Message, -Message, +Rule, -Rule • Event set is a set of events • Trigger is true when events are true • Consequence makes events true Larry Rudolph & Shalini Agarwal
Rules – A link is a rule • A message event is of form (node, message specifier) ( message specifier , node ) • Message came from or going to node • A link (x,y) is just shorthand for the rule: +( x , m ) ( - (x, m) , +(m , y) ) If a message m arrives at node x, then make that event false (remove the message) andmake the event of m arriving at y from core true. Larry Rudolph & Shalini Agarwal
Rules – Access Control Lists • An access control list is just a rule • When messages arrive at node, if they arrive from valid node, then allowed to continue to flow. • Modifying access control lists is just adding or removing rules. Larry Rudolph & Shalini Agarwal
Rules • Rule statement gets sent to rule manager • Event set is just another shorthand for rules • Rule manager sends command to trigger node thread that tells it about the consequence • Rules are reversible Larry Rudolph & Shalini Agarwal
Reversibility • Each statement is invertible (reversible) • If there is an error in the application specification, then can undo it all. • General debugging is possible with reversible rules and message flow Larry Rudolph & Shalini Agarwal