UBIQ – Low Bandwidth Visual Communication

UBIQ – Low Bandwidth Visual Communication Jonathan H. Connell Exploratory Computer Vision Group IBM T. J. Watson Research Center jconnell@us.ibm.com

What is it? • Links camera phone to any PC • PC user can see video, snap pictures • Good for a quick “beam in”

UBIQ concept: The expert can be everywhere • Field service dilemma (e.g. repair): • Most problems have simple solutions • Some field-service problems require experts • Experts are expensive, want to utilize effectively • Medium-skilled labor can fix many problems • Hybrid solution • Send out medium-skilled person for quick fix in most cases • Call back to main office for more difficult problems • “Beaming in” the expert • Sometimes verbal communications is insufficient • Pictures can be sent, but take a long time to transmit • Person in the field might take picture of wrong aspect • Provide a real-time “viewfinder” mode to allow expert to quickly snap the right picture on a remote mobile phone

… closer … … there. Open the side … Scenario: fixing a copier • Customer calls in problem = streaks on paper • Local maintenance guy shows up promptly • Checks for correct paper & toner level • Calls back to home office for advice • Shows paper markings • Expert asks for view of “fuser roller” • “What’s that?” • Local person uses video mode to get to correct location • Expert snaps image and examines • Fix problem by using alcohol wipe on this component (marked)

Demo – click here to play

Inspection at construction site • Concrete slab slipping down hill in Brazil • Fly in a civil engineer (while site idles) • Problem really requires a hydrologist? • Specialized medical consultation • Remote clinic in Botswana • Experts can’t (or won’t) travel there quickly • Check out foot rash without fear of contagion Other Scenarios

Making the top of the skill pyramid virtually ubiquitous. Value Proposition • Lower cost of operations • No expense for cars, plane trips, lodging … • Right brain at the right place quickly • Can easily change experts if needed • No delays due to flights, visa approval … • Increases customer satisfaction • Better leverage existing expertise • No time lost on travel (or getting lost) • Bigger expert recruitment pool • No onerous travel or relocation • Social skills less important

USA: blue = 400 kbaud, green = 50 kbaud South Africa: blue = 30 kbaud Critical Point: Designing around bandwidth • Verizon 3G EV-DO cites (uncompressed data): • Rev A peak: down = 600-1400 kbaud, up = 500-800 kbaud • Non Rev A peak: down = 400-700 kbaud, up = 60-80 kbaud • Local test: uplink 200KB in 25 sec  8KB / sec = 64 kbaud • Older CDPD / GPRS networks = 9.6-40 kbaud • Remote areas in US (Nebraska) • Developing countries (South Africa)

Video transmission • Uplink bandwidth intrinsically limited • Handset radiated power (batteries, FCC limits) • Distance to base station • Generally assume 10-50 kbaud (like old dial-up) • Motion “video” requires 5-10 fps • H.264 (MPEG-4) lowest = 64 kbaud for 176x144 @ 15fps • WMV for dialup = 38 kbaud for 160x120 @ 15 fps • Need very low-bandwidth codecs • 350 bytes / frame @ 53 kbaud for 15fps • 100-200 bytes / frame @ 10 kbaud for 5-10fps

Key technology • Low-bandwidth viewfinder suited to task • WHY: Allows expert to guide image acquisition more effectively • HOW: Use computer vision techniques to focus on “semantic” aspects • US patent 7,219,364 to IBM “System and Method for Selectable Semantic Codec Pairs for Very Low Data-Rate Video Transmission” Rudolf Bolle & Jonathan Connell (filed Feb. 2001, issued May 2007) Claims: • A system for compressing one or more video streams comprising: one or more image input devices creating the one or more video streams; and a selector process that selects a semantic compression process out of a set of semantic compression processes, the selected semantic compression process compressing the one or more video streams based on a task that required the compression of the one or more video streams and that utilizes content of the one or more video streams.

64 x 48 = 1242 bytes (1.2 secs) 128 x 96 = 2765 bytes (2.8 secs) 32 x 24 = 812 bytes (0.8 secs) Codec 1: JPEG stills • Compression settings • Moderate resolution • low quality (50) • Balance of clarity & speed • Non-linear with resolution • Network issues 4x fewer pixels 1.5x faster 4x more pixels 2.2x slower

Interaction with network • Ethernet TCP/IP packet structure: • 8 bytes Ethernet framing • 20 byte TCP header • 14 bytes IPv4 MAC header • 46-1500 bytes payload • 4 bytes CRC check code • Effective bandwidth over raw 10 kbaud link: • 100 bytes  146 bytes = 8.6 fps (32% overhead) • 200 bytes  246 bytes = 5.1 fps (19% overhead) • 1000 bytes  1046 bytes = 1.2 fps (4% overhead) • Nagel algorithm in TCP • Tries to combine small packets for better efficiency • Need to disable for acceptable latency (and smoothness) • Delayed ACK in TCP • Multi-packet transmit can be delayed 200ms if no down-linked command

16 x 12 x 8 bits = 192 bytes Interpolated 8 bits 16 x 12 x 4 bits = 96 bytes Interpolated 4 bits Codec 2: Progressive gray • Low spatial and intensity resolution • 16 x 12 pixels • 4 bit gray scale • Image = 96 bytes • 10fps @ 10 kbaud • No Huffman coding • not effective on short messages nearly identical

Algorithm • Progressive refinement • Send very low 4 bit resolution base • Send next resolution in 4 pieces • Send best resolution in 16 pieces • Add in low order bits in 16 pieces • Motion sensitivity • If basic scene changes start with new base image • Add resolution from the center outward • Long term stability • Don’t replace a good resolution image with a poorer one • Send new best resolution in 32 pieces in background

Base 16 x 12 pixels Central quarter 1 Central quarters 1 & 2 Central quarters 1 & 2 & 3 Refinement sequence

16 x 12 @ 4 bits (0.1 secs) 32 x 24 @ 4 bits (0.5 secs) 64 x 48 @ 4 bits (2.1 secs) 64 x 48 @ 8 bits (3.6 secs) Resolution sequence

Edge Magnitude 1 0 -1 2 0 -2 1 0 -1 Input Edge Direction (only 4 matter) 1 2 1 0 0 0 -1 -2 -1 Codec 3: Prominent lines • Convolve with Sobel masks • Y vs. X = angular direction • RMS value = magnitude

Choosing edges • Separate into horizontal and vertical edges • Find connected components • Determine maximal length elements • Keep best N

Pixel pattern Approximating edges • Find blob parameters • First order moments (centroid) • Second order moments (inertia) • Bounding box (max & min of x, y) • Get line endpoints • Line passes through centroid • Line is parallel to minimal axis • Clip to bounding box • Better than least squares • Not just minimum y error

INPUT Final line version • Keep and code 50 best • (x0, y0, x1, y1) in 240x180 • 200 bytes total  5fps

But only if low motion fattened previous “extra” edges  moved now - = Blend successive frames now mixed (grayed) previous + = Client side smoothing

Input Progressive Lines JPEG Comparison of codecs • Different rates: 10 fps, 5 fps, 0.8 fps • Color vs. gray • Iconic vs. graphical Demo – click here to play

UBIQ summary • Enhances visual communication • Multiple viewfinder codecs • Remote acquisition controls • Image mark-up possible • Fundamentals covered under US patent • Single platform implementation • Windows XP (PC client) • Windows Mobile 5.0 (Smartphone server) • Demo possible • http://www.research.ibm.com/people/j/jhc/ubiq/

Future work • Field testing • See which codecs are useful for which tasks • Porting to other phones • Java, Symbian (camera access?) • Development of additional codecs • Area based analog to lines • Hybrid lines + blobs • Spatially varying resolution • Camera tracking partial stills • Quick remote zoom refinement

UBIQ – Low Bandwidth Visual Communication