1 / 32

Computer System Challenges for BiReality

Explore the future of business interactions with cutting-edge sensory technology for a realistic remote experience. Enhancing visual and audio perceptions for immersive meetings.

sbarkman
Download Presentation

Computer System Challenges for BiReality

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer System Challenges for BiReality July 9, 2003

  2. Business travel is time consuming and expensive $100 Billion a year Current alternatives (audio and video conferences) leave a lot to be desired Motivation

  3. Project Overview • Goal: • To create to the greatest extent practical, both for the user and the people at the remote location, the sensory experience relevant for business interactions of the user actually being in the remote location.

  4. What Sensory Experiences are Relevant? • Visual perception: • Wide visual field • High resolution visual field • Everyone is “life size” • Gaze is preserved • Colors are accurately perceived • Audio perception: • High-dynamic range audio • Full frequency range • Directional sound field • Mobility: • Able to move around the remote location • Sitting/standing position and height • Other senses (smell, taste, touch, kinesthetic, vestibular) are not as important for most types of business

  5. System Using First-Generation Prototype What can we do with current technology? High-bandwidth, low-cost internet Surrogate at remote location User in immersion room

  6. Some Core Technologies • Common to both model 1 and 2: • Audio Telepresence • Gaze Preservation

  7. Audio Telepresence • Near CD-quality dynamic range & frequency range • Users should enjoy directional hearing • Enables listening to one person in room full of speaking people • aka the “Cocktail Party Effect” • Users should have directional output • Users can whisper into someone else’s ear in a meeting • This enables selective attention to and participation in parallel conversations • A compelling sense of presence is created (“spooky”) • Challenge: Full-duplex with minimum feedback • Limitation: Latency between local and remote location

  8. Gaze Preservation • Fundamental means of human communication • Very intuitive: toddlers can do it before talking • Not preserved in traditional videoconferencing • Gaze is very useful • Signals focus of attention (on presenter or clock?) • Turn taking in conversations • Remote and local participants must be presented life size for total gaze preservation (harder than 1:1 eye contact) • Otherwise angles don’t match • Life-size presentation also necessary for: • Reading facial expressions • Making everyone an equal participant

  9. Video

  10. 2nd Generation MIMT Advances • Two drivers of advances: • Improvements based on experience with model 1 • Improvements from base technology • More anthropormorphic surrogate • Closer to a human footprint • 360-degree surround video • Enables local rotation (model 3 feature) • Near-Infrared head tracking • Can’t use blue screen anymore • Eliminates blue screen halo in user’s hair • Preserves the user’s head height • User can sit or stand at remote location

  11. Model 2 Surrogate

  12. IR 360

  13. 360-Degree Surround Video • Improved video quality over model 1 • Four 704x480 MPEG-2 streams in each direction • Hardware capture and encoder • Software decoder (derived from HPLabs decoder) • Very immersive - after several minutes: • Users forget where door is on display cube • But users know where doors at remote location are

  14. Problem: Head Tracking in IR 360 • Can’t rely on chromakey • Heads of remote people projected on screens too

  15. Visible Images

  16. IR 360: Track via NIR Difference Keying • Projectors output 3 colors: R, G, B (400-700nm) • 3 LCD panels with color band pass filters • Projectors do not output NIR (700-1000nm) • Projection screens & room would appear black in NIR • Add NIR illuminators to evenly light projection screens • People look fairly normal in NIR (monochrome) • Use NIR cameras to find person against unchanging NIR background

  17. Near-Infrared Images

  18. Near-Infrared Differences

  19. Preserving the User’s Head Height • Lesson from model 1: Hard to see “eye-to-eye” with someone if you are sitting and they are standing • Formal interactions sitting (meeting in conference room) • Casual conversations standing (meet in hallway) • Model 2 system supports both: • Computes head height of user by NIR triangulation • Surrogate servos height so user’s eyes at same level on display • User controls just by sitting or standing • No wires on user => very natural

  20. Preserving Gaze and Height Simultaneously • Since the user can stand or sit down, a single camera won’t preserve gaze vertically • Similar problem to camera on top of monitor • Solution: • Use 4 color cameras in each corner of display cube • Select between them using video switcher based on user’s eye height • Eye height computed from head height via NIR • Tilt cameras in head of surrogate at same angle as user’s eyes to center of screen • Angle computed via NIR head tracking • Warp video in real time to so adjacent videos still match for panorama • When user stands or sits down, their perspective changes as if the screen was a window

  21. Enhancing Mobility with Model 3 • Many important interactions take place outside meeting rooms • Need mobility to visit offices, meet people in common areas • Lesson from model 1: teleoperated mechanical motion is unimmersive • Holonomic platform for Model 2 in development • Can move in any direction without rotation of platform • User rotates in display cube, not at remote location • No latency or feedback delay • Natural and immersive • Base will move perpendicular to plane of user’s hips • People usually move perpendicular to hip plane when walking • Speed will be controlled by wireless handgrip

  22. Model 3 Base • Same approximate form factor as model 2 base • Holonomic design with six wheels for enhanced stability

  23. Overview of System-Level Challenges • Mobile computing in the large • Model 1 and Model 3 surrogates run off batteries • Only have 1-2KWh (1000-2000X AAA cell) • Extreme computation demands from multimedia • 3.06GHz HT machines only capable of two 704x480 at 10fps • Already use hardware MPEG-2 encoders • Would like frame rates of 15 to 30fps • 15fps should be possible with 0.09um processors • Still only provides 20/200 vision • 20/20 vision would require 100X pixels, ? bits, ? MIPS

  24. Model 2 vs. Model 3 Surrogate • Model 2 is powered via a wall plug • Not mobile, but • Avoids power issues for now • Moore’s Law leads to more computing per Watt with time • Stepping stone to mobile model 3 • Only one model 2 prototype • Learn as much as possible before model 3

  25. Power Issues • Reduction in PCs from 4 to 2 saves power • Model 1 power dissipation = 550W • Model 2 power dissipation = 250W • Motion uses relatively little power • Good news – most power scales with Moore’s Law

  26. CPU and Graphics Requirements • Performance highly dependent on both • Graphics card performance • Dual screen display, 2048x768 • Most texture mapping for games use semi-static textures • We have to download new textures for each new video frame • 720x480x15x4 = 21MB/sec per video stream • 42MB/sec per card • CPU performance • Currently use 3.06GHz hyperthreaded Pentium 4, DDR333 • MPEG decoders have to produce bits at 42MB/sec • Currently uses 25% of CPU per video stream • 50% of CPU for two • Makes you really appreciate a 1940’s tube TV design

  27. WLAN Challenges • Model 2 bandwidth requirements • About 21Mbit/s total • 8 MPEG-2 full D1 (720x480) video streams (95% of bandwidth) • 5 channels of near CD-quality audio (only 5% of total bandwidth) • Getting these to work with low latency over 802.11a is challenging • Packets lost and lack of QOS • Like all WLAN, bandwidth is a strong function of distance • Vendors don’t like to advertise this • 1/9 bandwidth at larger distances common • Currently use 2nd generation Atheros 802.11a parts via Proxim • 108 Mb/sec max in Turbo mode • 12Mb/s at longer ranges

  28. UDP • Model 1 & 2 systems developed with TCP • TCP plus WLAN at near capacity doesn’t work • TCP can add delay • Small video artifacts better than longer latency for interactive use • Converting to UDP now • Requires adding error detection and concealment to MPEG-2 decoder

  29. Low Latency Video and Audio • Necessary for interactive conversations • Problems: • Buffering needed is a function of system and network load • Lightly loaded PC on internet 2: little buffering needed • Heavily loaded PC on WLAN: more buffering needed • Windows 2K is not a real-time OS • Hyperthreading or dual CPUs help responsiveness • Model 1 used dual CPU systems • Model 2 uses HT CPUs

  30. Summary • We are close to achieving a useful immersive experience • This is significantly better for unstructured human communication • Many key qualities not preserved in videoconferencing, including: • Gaze • Directional hearing • 360 degree surround vision • BiReality implementation technologies (PCs, projectors, etc.) are not that expensive and are getting cheaper, faster, and better • Enables lots of interesting research

  31. MIMT Project Team • Norm Jouppi, Wayne Mack, Subu Iyer, Stan Thomas and April Slayden (intern) in Palo Alto • Shylaja Sundar Rao, Jacob Augustine, Shivaram Rao Kokrady, and Deepa Kuttipparambil of the India STS

  32. Demo in 1L Lab • Across from Sigma conference room

More Related