ViSiCAST 2001 Technical Audit

ViSiCAST 2001 Technical Audit 8 October 2001, Brussels Michele Wakefield - Project Manager, ITC

The ViSiCAST Project Virtual Signing Capture Animation Storage and Transmission

Aims of ViSiCAST Project “…support improved access by deaf citizens to information and services in sign language” • user friendly methods to capture & generate signs • machine readable system to describe gestures • ... preferred medium is sign language

Instituut voor Doven Hamburg University University of East Anglia Independent Television Commission Televirtual The Post Office Royal Institute for Deaf People Institut National des Télécommunications Institut für Rundfunktechnik ViSiCAST Consortium

Project Dimensions • Duration • Start : January 2000 • Finish : December 2002 • 36 months • Total Costs • 3770kECU total • 2876kECU funding from EC

ViSiCAST Project Highlights • Prototype enabling text translation and direct synthesis of sign language gestures • Quality assessment support to other EU project • New TESSA system trial at Science Museum, London • Achieved BCS IT Award and Gold Medal • Innovative transmission assessment for broadcast TV • BBC seek to deliver a closed signing service for broadcast DTV • WWW Weather-forecaster with Virtual Signer available in 3 Sign Languages

Animation Linguistics WWW High Street Broadcast Evaluation Exploitation ViSiCAST Project Structure Technology User Application Exploitation &Dissemination

Technology Focus Objectives • WP 4 Animation • Increased realism in sign generation • Enhanced signing experience • WP5 Sign Language Linguistics • Use of natural sign language • Synthesis of sign language gestures

Animation Work: Objectives • WP4: • Develop Hi-Resolution Avatars + related capture, animation and transmission formats inc. compression • To enable and support application development in WPs 1-2-3 using WP4 (& WP5) Product. • To further develop, compare and integrate both proprietary and standard solutions, where appropriate

Animation: Current Work • Through Year Two • Continued to support Application development • Continuous upgrade to VISIA / TESSA player (Open GL renderer under Active X control) • Bug fixing / Motion capture support • .baf format and compression layer with WP1 to create Broadcast Demonstrator using Vsicast system • MPEG compatability / parallel development in WP4 and applications

Animation: Continuing & future Work • Working on ways to improve facial animation / realism (forehead / eyes) • Exploring Statistical Methods to define and generate facial Animation • Working on ways to facilitate Avatar creation (Photographic acquisition) • Mask 2 + Improved Mo Cap

5 to 25 kbit/s • 7 to 14 bit/vertex MPEG-4 compliant AnimationAchievements • Making use of a MPEG-4 compliant Visia model • Compliance with VRML standard (H-Anim • specifications) • Incorporating a full compression layer • 3D mesh & texture encoding • Motion parameters (BAP/FAP) encoding • Implementing importation and editing tools • Open delivery interface: MPEG-2, IP, ATM ... MPEG-4 SNHC for interoperable animation MPEG-4 SNHC player and server delivered in June 2001

MPEG-4 compliant AnimationPerspectives Advanced interoperable distributed animation system Improved facial animation MPEG-4 System layer implementation Multimedia (audio, video, text…) synchronisation Error resilience Management of scene description MPEG-compliant SiGML-driven animation Open input/output interface

Presentation by Streams - Linguistics • WP 4 Animation • Increased realism in sign generation • Enhanced signing experience • WP5 Sign Language Linguistics • Use of natural sign language • Synthesis of sign language gestures

WP 5: Language Technology • Goal within the project: • To provide semi-automatic translation from English into BSL, DGS, NGT • Can also be used to assist the user in monolingual language input • No writing system for sign languages established

The last year: 3 deliverables • D5-1: Defining the interfaces • D5-2: Transfer to XML: • SiGML definition • D5-3 Prototype translation system: • English to notation

D5-1: Defining the interfaces • Adaptation of Discourse Representation Structure • Extension of HamNoSys, a phonetic transcription system for sign language • Notation conventions for all non-manual aspects relevant for (European) sign languages • Body movement • Head movement • Facial expressions • Mouthing and Mouth gestures • Eye movement • Synchronicity with manual elements

D5-2: SiGML • Defines XML domain based on D5-1 manual and non-manual notation • Simple timing model • Probably to be revised to ease integration with upcoming synchronisation models as required for broadcasting etc. • SMIL, XMT (MPEG4) etc.

D5-3: Proto text-to-sign notation • English to semantics (DRS) • CMU Parser • DRS construction • Semantics to sign language notation • DRS to HPSG semantics (ALE/MRS) • HPSG generation (ALE/LinGo) • HPSG PHON (HamNoSys) to SiGML

HPSG modelling of sign languages • Aiming at proper sign language, not anything like SEE • No detailed grammars published, no usable dictionaries • Most importantly: Data-driven • Lexicon and every aspect of our grammar fragment

Example: Verifying details

Demo: D5-3 plus D4-2 • Due month 26 (Feb 02), i.e. work in progress • Complete route from English to sign language animation

Synthetic Animation of SiGML Convert avatar-independent SiGML to avatar-specific description: Define all SiGML locations (shoulder, eyes, fingertip, etc.) in terms of the avatar's geometry Define hand shapes in terms of rotations of the hand joints Determine arm joint rotations from hand positions by inverse kinematics Convert SiGML movements into numerically defined trajectories Output in BAF format or VRML

Biocontrol model Model each joint by a second-order control system a muscle applies a torque to the joint, resisted by a moment of inertia and damping Generate different types of motion (fast, slow, etc.) by varying the model parameters

Ambient motion If only hands, arms, and face are animated, the result is stiff and lifeless. Animate the spine and head by mixing “ambient motion” from motion capture files with synthetic animation.

Closing the feedback loop • So far, only the native signers involved in the project can judge the output of our HPSG generation system • Requires intimate knowledge of HamNoSys at least • With the animation output, we have access to the native signers’ intuition of much more people than today • Opens the way to more formal evaluation of the generation system than is available to date

Summary: Language Technology • First successful steps in HPSG language modelling and translation of English to sign language • Encoding established and extended sign language notation with standard description model (XML) • Already close to closing the feedback loop to allow native signers evaluation of our language production system

Presentation by Streams • Animation and Linguistics • User Applications : Evaluation of broadcast transmission for DTV • Exploitation and Dissemination

Presentation by Streams - Television • WP1 Television • Closed signing for Broadcast DTT • Enhanced signing experience • Regulation and Standards • WP2 Internet • Information and Education for Deaf People • WP3 Face to Face • High Street Post Office Counter Services • Science Museum Trial - Summer 2001

VH on TV: The Advantages Low transmission rate < 25 kbit/s Compatibility with signing on other media and foreign deaf languages Precise, sharp representation of signer Open display options Compliance with international standards: MPEG, DVB Future-proof: cost saving allows vast no. of signed programmes no transition from video-based to VH signing

Broadcast VH Signing:Achievements Integrated TX system for broadcast to STBs demonstrator complete end of 2000 Implementing virtual human s/w in STB Incorporating a compression layer Using MPEG-2 delivery layer for maximum compliance: with existing hardware with MPEG & DVB standards with proprietary formats

MPEG-2 AV encoder MPEG-2 AV decoder C O M P O S E MUX Packet MPEG-4 SNHC encoder MPEG-4 SNHC decoder MPEG-4 SNHC player BAF encoder BAF decoder BAF player Broadcast VH Signing:Functional architecture dePacket deMUX normative MPEG-2 TS proprietary Compositor Encoder Decoder System System Delivery

UDP/TCP packetiser Thomson MPEG encoder IP filter DVB receiver card RF modulator Broadcast VH Signing:System layer implementation MPEG-2 TS Compositor Encoder Decoder System System Delivery

SiGML Scene desc. Audio Video Text SNHC Video Audio BAF FlexMUX MPEG-2 Packetized Elementary Stream (PES) Section PES MPEG-2 Transport Stream (TS) Broadcast VH Signing:Versatile delivery architecture Content description Proprietary MPEG-7 MPEG-2 MPEG-4 SiGML Coding Delivery DVB compliant

Broadcast VH Signing:Perspectives Advanced TX system for broadcast to STBs Open, MPEG & DVB compliant architecture Improved synchronisation layer Integrating a compositing layer Implementing a complete MPEG-4 multimedia player Integrating SiGML stream

2 4 2 4 MPEG- AV encoder MPEG- AV decoder MPEG Compositor Multimedia player MUX Packet MPEG-4 SNHC encoder MPEG-4 SNHC decoder BAF encoder BAF decoder Broadcast VH Signing:Targeted architecture dePacket deMUX normative MPEG-2 TS proprietary Compositor Encoder Decoder System System Delivery

Presentation by Streams - WWW • WP1 Television • Closed signing for Broadcast DTT • Enhanced signing experience • Regulation and Standards • WP2 Internet • Information and Education for Deaf People • WP3 Face to Face • High Street Post Office Counter Services • Science Museum Trial - Summer 2001

Weather Forecast Application • First WWW application:daily weather forecast in 3 sign languages • content creation • example forecast • evaluation

Demo

Evaluation with Deaf users • Subjective quality of signing rated as ‘reasonable’ or ‘good’ • 68% correct or partially correct • Improvement possibilities • mouthing • facial expressions

Mouthing Scores for signs depending in various degrees on mouthing

Facial Expressions Scores for signs depending in various degrees on facial expressions

Next Steps • Improvements • Beta-testing • on line • larger user group • user feedback • Exploitation planning

Presentation by Streams – Face to Face • WP1 Television • Closed signing for Broadcast DTT • Enhanced signing experience • Regulation and Standards • WP2 Internet • Information and Education for Deaf People • WP3 Face to Face • High Street Post Office Counter Services • Science Museum Trial - Summer 2001

WP3: Face-to-face transactions • Research concentrated on TESSA (Text and Sign Support Agent) • Enables Post Office counter clerks to “translate” from (English) speech to sign language • System developments: • Autumn 2000: New system software completed, incorporating IBM “Via Voice” speech recognition and improved avatar • Spring 2001:  200 new signs recorded, processed and added to system • Spring/Summer 2001: Development and testing of “unconstrained system”

First System using Constrained Speech Recognition

ViSiCAST 2001 Technical Audit