An NLP Application: Designing an English-to-ASL Machine Translation System

An NLP Application:Designing an English-to-ASL Machine Translation System Matt Huenerfauth CSE-391 “Introduction to Artificial Intelligence”University of Pennsylvania, March 25, 2005 Research Advisors: Mitch Marcus & Martha Palmer Computer and Information ScienceUniversity of Pennsylvania • Adapted from presentations given at: • The 6th International ACM SIGACCESS Conference on Computers and Accessibility, October 20, 2004, Atlanta, GA • The 10th International Conference on Theoretical and Methodological Issues in Machine Translation, October 4, 2004, Baltimore, MD

English-to-ASL MT • Development of English-to-ASL machine translation (MT) software for accessibility applications has been slow… • Misconceptions: the deaf experience, ASL linguistics, and ASL’s relationship to English. • Challenges: some ASL phenomena are very difficult (but important) to translate. We’ve had to develop some new models for ASL generation.

Misconceptions about Deaf Literacy and ASL MT How have they affected research?

Audiology Online Misconception:All deaf people are written-English literate. • Only half of deaf high school graduates (age 18+) can read English at a fourth-grade (age 10) level, despite ASL fluency. • Many deaf accessibility tools forget that English is a second language for these students (and has a different structure). • Applications for a Machine Translation System: • TV captioning, teletype telephones. • Computer user-interfaces in ASL. • Educational tools using ASL animation. • Access to information/media.

Building tools to address deaf literacy… What’s our input? English Text. What’s our output? ASL has no written form. Imagine a 3D virtual reality human being… One that can perform sign language… But this character needs a set of instructions telling it how to move! Our job: English  These Instructions. VCom3d

Building tools to address deaf literacy… We can use an off-the-shelf animated character. Photos: Seamless Solutions, Inc.Simon the Signer (Bangham et al. 2000.)Vcom3D Corporation

Misconception:ASL is just manually performed English. • Signed English vs. American Sign Language. • Some ASL sentences have a structure that is similar to written languages. • Other sentences use space around signer to describe 3D layout of a real-world scene. • Hands indicate movement and location of entities in the scene (using special handshapes). • These are called “Classifier Predicates.”

Gaze Right Viewer Viewer Loc#1 Loc#3 Left sign:HOUSE To Loc#1 sign:CAT To Loc#3 Viewer Eyes follow right hand. Gaze sign:CAR Path of car, stop at Loc#2. Right To Loc#2 Left Example Classifier Predicate The car parked between the cat and the house. (Loc#1) (Loc#3) (Loc#2) Note: Facial expression, head tilt, and shoulder tilt not included in this example.

Misconception:Traditional MT software is well-suited to ASL. • Classifier predicates are hard to produce. • 3D paths for the hands, layout of the scene. • Grammar rules & lexicons? Not enough. • No written form of ASL. • Very little English-ASL parallel corpora. • Can’t use machine learning approaches. • Previous systems are only partial solutions. • Some produce only Signed English, not ASL. • None can produce classifier predicates.

Misconception:OK to ignore visual/spatial ASL phenomena. But classifier predicates are important! • CPs are needed to convey many concepts. • Signers use CPs frequently.* • English sentences that produce CPs are the ones that signers often have trouble reading. • CPs needed for some important applications • User-interfaces with ASL animation • Literacy educational software * Morford and McFarland. 2003. “Sign Frequency Characteristics of ASL.” Sign Language Studies. 3:2.

ASL MT Challenges:Producing Classifier Predicates A new set of generation models…

Focus on Classifier Predicates • Previous ASL MT systems have shown promise at handling non-spatial ASL phenomena using traditional MT technologies. • This project will focus on producing the spatially complex elements of the language: Classifier Predicates of Movement and Location (CPMLs). • Since some of these new MT methods for CPMLs are computationally expensive, we’ve proposed a multi-path* MT design. * Huenerfauth, M. 2004. “A Multi-Path Architecture for English-to-ASL MT.” HLT-NAACL Student Workshop.

3D Software Traditional MT Software Word-to-Sign Look-up ASL sentencecontaining a classifier predicate Spatially descriptive English sentences… ASL sentencenot containing a classifier predicate EnglishInputSentences Most English sentences… Signed English Sentence Sentences that the MT software cannot successfully translate… * Huenerfauth, M. 2004. “A Multi-Path Architecture for English-to-ASL MT.” HLT-NAACL Student Workshop.

ASL sentencecontaining a classifier predicate 3D Software Spatially descriptive English sentences… EnglishInputSentences CPML Generation Models What are the representations used in the English to CPML pathway?

Design of the CPML Pathway 3D Animationof the Event CP Discourse 3D AnimationPlanning Operator CP Semantics CP Syntax Predicate-ArgumentStructure CP Phonology EnglishSentence

CP Generation Models Discussed • Scene Visualization • Discourse • Semantics • Syntax • Phonology (we’ll talk about this one first)

Overall Architecture Phonological Model 3D Animationof the Event CP Discourse 3D AnimationPlanning Operator CP Semantics Body Parts Moving Through Space: “Articulators” CP Syntax Pred-ArgStructure CP Phonology EnglishSentence

ASL Phonetics/Phonology • “Phonetic” Representation of Output • Hundreds of animation joint angles. • Traditional ASL Phonological Models • Hand: shape, orientation, location, movement • Some specification of non-manual features. • Tailored to non-CP output: Difficult to specify complex motion paths. CPs don’t use as many handshapes and orientation patterns.

Gaze Right At Viewer At Viewer Location #1 Location #3 Left sign:HOUSE To Loc #1 sign:CAT To Loc #3 At Viewer Eyes follow right hand. Gaze sign:CAR Path of car, stop at Loc #2. Right To Location #2 Left Example Classifier Predicate The car parked between the cat and the house. Note: Facial expression, head tilt, and shoulder tilt not included in this example.

Phonological Model • What is the output? • Abstract model of (somewhat) independent body parts. • “Articulators” • Dominant Hand (Right) • Non-Dominant Hand (Left) • Eye Gaze • Head Tilt • Shoulder Tilt • Facial Expression What informationdo we specify for each of these?

Values for Articulators • Dominant Hand, Non-Dominant Hand • 3D point in space in front of the signer • Palm orientation • Hand shape (finite set of standard shapes) • Eye Gaze, Head Tilt • 3D point in space at which they are aimed.

Overall Architecture Scene Visualization Approach 3D Animationof the Event CP Discourse 3D AnimationPlanning Operator CP Semantics Converting an English sentence into a 3D animation of an event. CP Syntax Pred-ArgStructure CP Phonology EnglishSentence

Previously-Built Technology • AnimNL System • Virtual reality model of 3D scene. • Input: English sentences that tell the characters/objects in the scene what to do. • Output: An animation in which the characters/objects obey the English commands. Bindiganavale, Schuler, Allbeck, Badler, Joshi, & Palmer. 2000. "Dynamically Altering Agent Behaviors Using Nat. Lang. Instructions." Int'l Conf. on Autonomous Agents. Related Work: Coyne and Sproat. 2001. “WordsEye: An Automatic Text-to-Scene Conversion System.” SIGGRAPH-2001. Los Angeles, CA.

We won’t discussall the details, but one part of the process is importantto understand.(We’ll come backto it later.) How It Works 3D Animationof the Event 3D AnimationPlanning Operator Pred-ArgStructure EnglishSentence

Example Step 1: Analyzing English Input • The car parked between the cat and the house. • Syntactic analysis (build a parse tree). • Identify word senses: e.g. park-23 • Identify discourse entities: car, cat, house. • Predicate Argument Structure • Predicate: park-23 • Agent: the car • Location: between the cat and the house

Example Step 2: AnimNL builds 3D scene

Example Step 2: AnimNL builds 3D scene Original Image: Simon the Signer (Bangham et al. 2000.)

Overall Architecture Discourse Model 3D Animationof the Event CP Discourse 3D AnimationPlanning Operator CP Semantics CP Syntax Pred-ArgStructure CP Phonology EnglishSentence

Discourse Model Motivations • Preconditions for Performing a CP • (Entity is the current topic) OR (Starting point of this CP is the same as the ending point of a previous CP) • Effect of a CP Performance • (Entity is topicalized) AND (assigned a 3D location) • Discourse Model must record: • topicalized status of each entity • whether a point has been assigned to an entity • whether entity has moved in the virtual reality since the last time the signer showed its location with a CP

Discourse Model • Topic(x) – X is the current topic. • Identify(x) – X has been associated with a location in space. • Position(x) – X has not moved since the last time that it was placed using a CP.

Model includes a subset of the entities in the 3D scene: those mentioned in the text. All values initially set to false for each entity. Example Step 3: Setting up Discourse Model CAR: __ Topic? __ Location Identified? __ Still in Same Position?HOUSE: __ Topic? __ Location Identified? __ Still in Same Position?CAT: __ Topic? __ Location Identified? __ Still in Same Position?

Overall Architecture Semantic Model 3D Animationof the Event CP Discourse 3D AnimationPlanning Operator CP Semantics Invisible 3D Placeholders: “Ghosts” CP Syntax Pred-ArgStructure CP Phonology EnglishSentence

Semantic Model • 3D representation of the arrangement of invisible placeholder objects in space • These “ghosts” will be positioned based on the 3D virtual reality scene coordinates • Choose the details, viewpoint, and timescale of the virtual reality scene for use by CPs

HOUSE CAR CAT Example Step 4: Producing Ghost Scene

Overall Architecture Syntactic Model 3D Animationof the Event CP Discourse 3D AnimationPlanning Operator CP Semantics Planning-Based Generation of CPs CP Syntax Pred-ArgStructure CP Phonology EnglishSentence

CP Templates • Recent linguistic analyses of CPs suggests that they can be generated by: • Storing a lexicon of CP templates. • Selecting a template that expresses the proper semantics and/or shows proper 3D movement. • Instantiate the template by filling in the relevant 3D locations in space. Liddel, S. 2003. Grammar, Gesture, and Meaning in ASL. Cambridge University Press. Huenerfauth, M. 2004. “Spatial Representation of Classifier Predicates for MT into ASL.” Workshop on Representation and Processing of Signed Languages, LREC-2004.

Animation Planning Process • This mechanism is actually analogous to how the AnimNL system generates 3D virtual reality scenes from English text. • Stores templates of prototypical animation movements (as planning operators) • Select a template based on English semantics • Use planning process to work out preconditions and effects to produce a 3D animation of event

A little more about planning… • Planning is a special form of search process in which you specify a goal to be achieved (in some logical language). • You may begin the process in some initial state (also specified in this logical language). • Then, you define a set of “planning operators” which represent actions that you can take. These have preconditions and effects (also specified in this logical language). • This makes it a little different than normal search.

A little more about planning… • During planning, you begin with your goal, and you try different combinations of planning operators to satisfy the goal. These may have preconditions which in turn must be satisfied by other planning operators. • The goal may be decomposed into sets of subgoals, and you can try satisfying each of these individually. An individual planning operator may also specify a complex set of sub-actions. • In the end, you produce a schedule of actions that should be taken in order to achieve the goal.

Example Database of Templates WALKING-UPRIGHT-FIGURE Parameters: g0 (ghost car parking), g1..gN (other ghosts) Restrictions: g0 is a vehicle Preconditions: topic(g0) or (ident(g0) and positioned(g0)) for g=g1..gN: (ident(g) and positioned(g)) Articulator: Right Hand Location: Follow_location_of( g0 ) Orientation: Direction_of_motion_path( g0 ) Handshape: “Sideways 3” Effects: positioned(g0), topic(g0), express (park-23 ag:g0 loc:g1..gN ) Concurrently: PLATFORM(g0.loc.final), EYETRACK(g0) MOVING-MOTORIZED-VEHICLE Parameters: g0 (ghost car parking), g1..gN (other ghosts) Restrictions: g0 is a vehicle Preconditions: topic(g0) or (ident(g0) and positioned(g0)) for g=g1..gN: (ident(g) and positioned(g)) Articulator: Right Hand Location: Follow_location_of( g0 ) Orientation: Direction_of_motion_path( g0 ) Handshape: “Sideways 3” Effects: positioned(g0), topic(g0), express (park-23 ag:g0 loc:g1..gN ) Concurrently: PLATFORM(g0.loc.final), EYETRACK(g0) LOCATE-BULKY-OBJECT Parameters: g0 (ghost car parking), g1..gN (other ghosts) Restrictions: g0 is a vehicle Preconditions: topic(g0) or (ident(g0) and positioned(g0)) for g=g1..gN: (ident(g) and positioned(g)) Articulator: Right Hand Location: Follow_location_of( g0 ) Orientation: Direction_of_motion_path( g0 ) Handshape: “Sideways 3” Effects: positioned(g0), topic(g0), express (park-23 ag:g0 loc:g1..gN ) Concurrently: PLATFORM(g0.loc.final), EYETRACK(g0) TWO-APPROACHING-UPRIGHT-FIGURES Parameters: g0 (ghost car parking), g1..gN (other ghosts) Restrictions: g0 is a vehicle Preconditions: topic(g0) or (ident(g0) and positioned(g0)) for g=g1..gN: (ident(g) and positioned(g)) Articulator: Right Hand Location: Follow_location_of( g0 ) Orientation: Direction_of_motion_path( g0 ) Handshape: “Sideways 3” Effects: positioned(g0), topic(g0), express (park-23 ag:g0 loc:g1..gN ) Concurrently: PLATFORM(g0.loc.final), EYETRACK(g0) LOCATE-SEATED-HUMAN Parameters: g0 (ghost car parking), g1..gN (other ghosts) Restrictions: g0 is a vehicle Preconditions: topic(g0) or (ident(g0) and positioned(g0)) for g=g1..gN: (ident(g) and positioned(g)) Articulator: Right Hand Location: Follow_location_of( g0 ) Orientation: Direction_of_motion_path( g0 ) Handshape: “Sideways 3” Effects: positioned(g0), topic(g0), express (park-23 ag:g0 loc:g1..gN ) Concurrently: PLATFORM(g0.loc.final), EYETRACK(g0) PARKING-VEHICLE Parameters: g0 (ghost car parking), g1..gN (other ghosts) Restrictions: g0 is a vehicle Preconditions: topic(g0) or (ident(g0) and position (g0)) for g=g1..gN: (ident(g) and position (g)) Articulator: Right Hand Location: Follow_location_of( g0 ) Orientation: Direction_of_motion_path( g0 ) Handshape: “Sideways 3” Effects: positioned(g0), topic(g0), express (park-23 ag:g0 loc:g1..gN ) Concurrently: PLATFORM(g0.loc.final), EYETRACK(g0)

Example Step 5: Initial Planner Goal • Planning starts with a “goal.” • Express the semantics of the sentence: • Predicate: PARK-23 • Agent: “the car” discourse entity • We know from lexical information that this “car” is a vehicle (some special CPs may apply) • Location: 3D position calculated “between” locations for “the cat” and “the house.”

Example Step 6: Select Initial CP Template PARKING-VEHICLE Parameters: g_0, g_1, g_2 (ghost car & nearby objects) Restrictions: g_0 is a vehicle Preconditions: topic( g_0 ) or ( ident( g_0 ) and position( g_0 )) (ident( g_1 ) and position( g_1 )) (ident( g_2 ) and position( g_2 )) Articulator: Right Hand Location: Follow_location_of( g_0 ) Orientation: Direction_of_motion_path( g_0 ) Handshape: “Sideways 3” Effects: position( g_0 ), topic( g_0 ), express(park-23 agt: g_0 loc: g_1, g_2 ) Concurrently: PLATFORM( g_0.loc.final), EYETRACK( g_0 )

Example Step 7: Instantiate the Template PARKING-VEHICLE Parameters: CAR, HOUSE, CAT Restrictions: CAR is a vehiclePreconditions: topic(CAR) or (ident(CAR) and position(CAR)) (ident(CAT) and position(CAT)) (ident(HOUSE) and position(HOUSE))Articulator: Right Hand Location: Follow_location_of( CAR ) Orientation: Direction_of_motion_path( CAR ) Handshape: “Sideways 3” Effects: position(CAR), topic(CAR), express(park-23 agt:CAR loc:HOUSE,CAT ) Concurrently: PLATFORM(CAR.loc.final), EYETRACK(CAR)

Gaze Right Left Eyes follow right hand. Path of car, stop at Loc#2. To Loc#2 Example Step 7: Instantiate the Template PARKING-VEHICLE Parameters: CAR, HOUSE, CAT Restrictions: CAR is a vehicle Preconditions: topic(CAR) or (ident(CAR) and position(CAR)) (ident(CAT) and position(CAT))(ident(HOUSE) and position(HOUSE)) Effects: position(CAR), topic(CAR), express (park-23 agt:CAR loc:HOUSE,CAT )

Eyes follow right hand. Path of car, stop at Loc#2. To Loc#2 Example Step 8: Begin Planning Process PARKING-VEHICLE Parameters: CAR, HOUSE, CAT Restrictions: CAR is a vehicle Preconditions: topic(CAR) or (ident(CAR) and position(CAR)) (ident(CAT) and position(CAT))(ident(HOUSE) and position(HOUSE)) Effects: position(CAR), topic(CAR), express (park-23 agt:CAR loc:HOUSE,CAT ) Gaze Right Left

Example Other Templates in the Database • We’ve seen these: • PARKING-VEHICLE • PLATFORM • EYEGAZE • There’s also these: • LOCATE-STATIONARY-ANIMAL • LOCATE-BULKY-OBJECT • MAKE-NOUN-SIGN

LOCATE-STATIONARY-ANIMAL Parameters: CAT Restrictions: CAT is an animal Preconditions: topic(CAT) Effects: topic(CAT), position(CAT), ident(CAT) Gaze Right Eyes follow right hand. Left Path of car, stop at Loc#2. Gaze Eyes at Cat Location. To Loc#2 Right Move to Cat Location. Left Example Step 9: Planning Continues… PARKING-VEHICLE Parameters: CAR, HOUSE, CAT Restrictions: CAR is a vehicle Preconditions: topic(CAR) or (ident(CAR) and position(CAR)) (ident(CAT) and position(CAT)) (ident(HOUSE) and position(HOUSE)) Effects: position(CAR), topic(CAR), express (park-23 agt:CAR loc:HOUSE,CAT )

position(CAT)position(HOUSE) topic(HOUSE)identify(HOUSE) topic(CAT)identify(CAT) topic(CAR)identify(CAR) Example Step 9: Planning Continues… MAKE-NOUN:“HOUSE” LOCATE-BULKY-OBJECT MAKE-NOUN:“CAT” LOCATE-STATNRY-ANIMAL MAKE-NOUN:“CAR” PARKING-VEHICLE EYEGAZE PLATFORM (concurrently)

at Loc#1 at Loc#3 follow car at viewer at viewer at viewer Gaze Right CAT HOUSE CAR Left Example Step 10: Build Phonological Spec MAKE-NOUN:“HOUSE” LOCATE-BULKY-OBJECT MAKE-NOUN:“CAT” LOCATE-STATNRY-ANIMAL MAKE-NOUN:“CAR” PARKING-VEHICLE EYEGAZE PLATFORM

Wrap-Up and Discussion

An NLP Application: Designing an English-to-ASL Machine Translation System