450 likes | 542 Views
Communications, Collaboration, and Community. Anoop Gupta Microsoft Research Collaborators: Michael Cohen, Ross Cutler, Zicheng Liu, Yong Rui, Kentaro Toyama, Zhengyou Zhang, and others. Deployment-Driven Multidisciplinary Research: Challenges and Opportunities. Anoop Gupta
E N D
Communications, Collaboration, and Community Anoop Gupta Microsoft Research Collaborators: Michael Cohen, Ross Cutler, Zicheng Liu, Yong Rui, Kentaro Toyama, Zhengyou Zhang, and others
Deployment-Driven Multidisciplinary Research:Challenges and Opportunities Anoop Gupta Microsoft Research Collaborators: Michael Cohen, Ross Cutler, Zicheng Liu, Yong Rui, Kentaro Toyama, Zhengyou Zhang, and others
Collaboration and Multimedia Group • 16 people • 9 Researchers, 5 R-SDEs, 1 Designer, 1 Usability • Diverse: Systems, Cog Psych, Sociologist, Vision, Graphics • Focus: • Peripheral awareness and people-centric interfaces • Tele-presentation and tele-meeting technologies • Make audio-video information a first-class citizen • Enhanced online communities =>Technologies, Applications, and Social Factors
Peripheral awareness and people-centric interfaces • How do we stay aware of relevant information without annoying notifications • How do we stay aware of people, communicate with them, and bring them to the front of the user interface • How can we leverage technology to provide a better idea of people/environment state
Tele-presentations and tele-meetings • Leverage the combination of • cheap sensors (cameras, microphones, …), • cheap computing power, bandwidth, and storage, • Advances in vision-graphics-SP technologies • Convincing remote presence and interactivity • Whiteboard, note-taking, local interaction tools • High quality recording and archiving • Rich indices and browsing support
Make audio-video information a first-class citizen • Low-cost and high-quality capture • Automatic index creation and highlights • Rich support for annotation and collaboration • Browsing tools and interfaces
Enhanced online communities • Tracking Interaction / Social History • Incentive Structures • Encourage high quality content creation • Encourage interaction • Discourage inappropriate behavior • Filtering and Synopsis • Community Portals
Outline • Our group • Research approach • Project samplings • Office activity modeling • Distributed meetings • Tele-presentations • Face modeling • Concluding Remarks / Challenges
Evaluation / Publication Refine Prototype Product Impact Build Prototype Research Approach • Deployment-driven research • End-users vs. other researchers as main customer • Robustness vs. Functionality • Multiple sensor technologies with graceful degradation • Value existing infrastructure • Simplicity of set-up and operation • Design with end-user in the loop • Field evaluations • Multi-disciplinary tool-set
1. Office Activity Modeling(joint with ASI group at MSR) • Uses of Office Awareness • Intelligent messaging • Send messages on appropriate channel • instant message, office phone, e-mail, mobile, etc. • Intelligent instant messaging • Stopped typing = not there • Peripheral awareness for “buddies” • Is now a good time to drop by Jack’s office?
So how does the deployment-driven approach impact our decisions?
Environment and Outputs • Environment • Office with door (w/ window); Cubicle; Open plan; … • Number of people • (0 / 1+) | (0 / 1 / 1+) | (0/1/2/3/…) • Gross activity • At desk; On PC ; On phone; In meeting; … • Fine activity • Who are the people present • Reading; Answering mail; … • Activity Trends • Usually comes in at 7am, leaves at 5pm • Never comes in on weekends • …
Sensors • Keyboard / Mouse • Calendar (appointment schedule) • Desktop microphone • TAPI-enabled phone (VoIP) • Desktop camera • Other: • Motion detector, high-quality microphone / headset; bird’s-eye camera; laser/IR gates;thermal cameras etc.
Making the Inferences… in increasing approximate expected order of research interest • Use reliable sensors as much as possible • Use reliable sensors to label data for other sensors • For vision, stick to reliably extractable, robust cues (e.g., presence of motion, optic flow) • “Quasi-supervised” learning, using data labeled as above
Results • Eve/Priorities project at MSR (ASI) • Integrates capture of features (keyboard/mouse use, app use, vision, audio events,…) • Language for combining low-level features • Bayesian fusion • Vision component can determine whether person is facing front or not, but still not as robust as desired • Current work in quasi-supervised learning of low-level features… Hope to deploy base versions in summer
Results(preliminary) Concatentation of 3 sections of low-level vision data only, sampled from 8-hour log Unsupervised clustering segments sections cleanly.
Correlates with high keyboard/mouse activity, no speech Ground truth: 1 person at monitor Results(preliminary)
Benefits and Challenges • Benefits • Prioritizing problems and context • How far we need to push the solution • Earlier benefits for end-users; enables social science research • Drawbacks • Need substantial engineering (plus algorithmic) skills • Need multidisciplinary team
2. Distributed Small Group Meetings • Scenario: • Imagine 8-10 people • In conference room, from desktops, mobile • Rich back and forth interaction • Archival and browsing support
Contextualized Research Challenges • Novel camera, microphone, display systems • Speaker tracking; multi-person tracking • Gaze and pose correction • Activity tracking and gesture recognition • Graphical avatars and virtual environments • Real and virtual camera management • Automated indexing and browsing support • Integration of handheld devices • User interface / User experience
First Prototype Omni-directional camera Meeting environment 360-degree panorama view An example omni image
Second Prototype • Cost $300 vs. $10K • Much better quality ~3000 x 500 pixels • All processing done on the PC
All-up Computer controlled User controlled User + Computer + Overview Remote Interfaces
Short/Medium Term Plan • Cameras, Calibration, Stitching • Camera design to minimize parallax • Automatic camera calibration • Real-time on today’s processors • Speaker detection and multiple-person detection • Microphone array sound source localization • Computer vision tracking of multiple people • Fusing A/V for better speaker detection • Simple remote participation interface • Automatic camera management • Video compression, storage, and transmission • Automatic index creation and meeting browsing Expect to deploy in a few conference rooms during summer
3. Tele-Presentations • Enable people to • Easily broadcast/capture lectures (speaker and audience) • Esthetically pleasing • Participate from remote locations • Solution components • Tracking cameras, microphone arrays, … • Video production rules from professionals • Mapping of rules to cameras and software video director • Remote presence and interactivity system (TELEP) • First prototype being used in the small lecture room at MSR
Key Modules • Speaker tracking and audience tracking • Computer-vision-based tracking • Microphone-array-based tracking
Key modules (cont) • Virtual video director (FSM) • Maintain min shot duration • Dynamic max shot duration • Function of shot quality • Triggers TIME_EXPIRE event • Monitoring status change • Triggers STATUS event • Encode editing knowledge into transition probabilities
Initial Deployment Results • Tested concurrent human operator and our system • Field study • Lab study • Results: • Human operator better, but difference is not statistically significant • People could not distinguish which operator was human and which was computer
Technical Challenges • Design and configuration of camera/m-phone systems • More robust lecturer tracking • Smooth tracking in close-up shots • Multiple lecturers • Lecturers move into the audience area • More robust audience tracking • Background noise and room reverberation • More sophisticated rules and knowledge • Human operators have much better ability to deal with exceptions • A flexible/learning automated camera management system
4. Face Modeling • Technical goals: • Build a realistic-looking face model from video images • The face model can be animated right away • Painless in data acquisition & Efficient in model building • Commodity equipment (computer+camera) • No special requirement on the acquisition condition (background, lighting, …) • Uses: • Enhanced chat / gaming environments • Conferencing over low-bandwidth links
Example Application: Virtual Poker • Designed as a social interface • Each player controls an avatar • Some behaviors automatically generated
I guess it’s my turn Virtual Poker • Players automatically turn to follow action/voice
Research Challenges • Teeth, tongue, eyes and hair • Personalized facial expressions • Real-time animation driven from video • Yet more robust and easy to use
Outline • Our group • Research approach • Project samplings • Office activity modeling • Distributed meetings • Tele-presentations • Face modeling • Concluding Remarks / Challenges
% Complete Effort Spent Concluding Remarks • Focus on deployment-driven research • Tremendous leverage in: • Prioritizing problems we explore • Context we assume while solving • How far we push the solution • Earlier benefits for end-users • Enabling social science research • Keeping management support
Challenges: • Need more resources (or pursue fewer things) • Need substantial engineering (plus algorithmic) skills • Premier conferences do not appreciate engineering aspects • Not all important research yields to above constraints • Some solution options: • Community shared infrastructure (environments) into which things can be plugged (e.g., SUIF for compilers) • Premier conferences / Senior researchers attitudes • Funding agency attitudes
Focus on multidisciplinary research • Tremendous leverage in providing: • More robust solutions (or solutions at all) • More cost effective solutions • Getting deployment of research ideas out to end-user and the knowledge from resulting feedback • Challenges: • Vision, Video, Graphics, Hardware, Speech, SP, … • Need diversity within the group plus close ties externally • Need supportive management and funding structure • Academic departments, lab research groups, conferences, tenure organized around traditional disciplinary boundaries • Discourages pushing one discipline as hard as possible when another provides an easier answer
Some solution components: • Strong leaders (e.g., Hennessy – Brought Arch, Compilers, Prog. Lang, OS folks together) • Premier conferences / Senior researchers attitudes • Funding agency attitudes
Questions / Discussion • Graphics: What is the killer application in the workplace? • Vision: How can we identifying the state of the art to a non-expert? • Are you satisfied with the degree of connection with the end-user/reality in your sub-field? • What do you think of the role of multi-disciplinary research? Who should do it? • Do we have balance?
Graphics: What is the killer application in the workplace • We have tried: • 3D Shell • 3D Avatars in tele-meetings • 3D in visualizations, … • … • Killer application still eludes us
Vision: Identifying the state of the art • E.g., Speech • Speaker dependent or independent • Size of vocabulary • Language model / Grammar / Domain • Microphone quality • What’s the equivalent for vision • How can we characterize / partition / … the space in a way so that the non-expert knows when/where vision technology can be relied upon