650 likes | 658 Views
Handling Spatially Complex English-to-ASL MT with a Multi-Path Pyramidal Architecture. Matt Huenerfauth CLUNCH Presentation November 3, 2003. ASL Machine Translation with Pyramids and Invisible Worlds. Matt Huenerfauth CLUNCH Presentation November 3, 2003. Today’s Talk.
E N D
Handling Spatially Complex English-to-ASL MT with a Multi-Path Pyramidal Architecture Matt Huenerfauth CLUNCH PresentationNovember 3, 2003
ASL Machine Translationwith Pyramids andInvisible Worlds Matt Huenerfauth CLUNCH PresentationNovember 3, 2003
Today’s Talk This is work in progress. • ASL Linguistics and Machine Translation • Initial Approaches to ASL MT • Handling Spatially Complex ASL • A Multi-Path MT Architecture. • Adopting some HMS lab technology. • Interesting Linguistic Motivations. • Current and Future Work
Motivations and Applications • Only half of deaf high school graduates can read English at a fourth-grade level – despite sophisticated ASL fluency. • Many efforts to help the deaf access the hearing world forget English is their 2nd language (& different than ASL). • Applications for a Machine Translation System: • TV captioning, teletype telephones. • Human interpreters intrusive/expensive. • Educational tools, access to information. • Storage and transmission of ASL.
Output: Signing Virtual Humans • Virtual reality models of the human form are now articulate & fast enough to produce ASL. • ASL Generator produces instructions for the avatar, and the avatar performs the signs -- producing animated output for the user. • Our problem is how to build these instructions.
Virtual Signing Humans Photos: Seamless Solutions, Inc.Simon the Signer (Bangham et al. 2000.)Vcom3D Corporation
ASL Linguistics I • What is ASL? • Real language? Who uses it? • Different than SEE or SSE. • How is it different than English? • Grammar, Vocabulary, Visual/Spatial. • More than the Hands: Simultaneity! • How signs can be changed: Morphology! • Use of Space around the Signer…
ASL Linguistics II • Discourse Space • Put discourse entities on “shelves” for later referential use. • “Agreement” - Pronouns, Possessives, Verbs. • Don’t interpret locations literally. (Bob to the left of Tim.) • Three-Dimensional Space • Space around signer is visually analogous to a real scene. • Classifier Predicates • Signers describe 3D scenes with their hands. • Meaningful handshape and 3D representative movement path.
ASL Linguistics III • Traditional Sentences: (No classifier predicates.)Where does Billy attend college? wh #BILLY IXx GO-TO UNIVERSITY WHERE • Spatially Complex: (Uses classifier predicates.)I parked my car next to his cat.POSSx CAT ClassPred-bent-V-{locate cat in space}POSS1s CAR ClassPred-3-{park next to cat}The truck drove down the windy road.IXx TRUCK ClassPred-3-{drive on windy road} 8
Initial Approaches to ASL MT Non-statistical Direct and Transfer MT Architectures
Corpora for ASL? • ASL has no written form; so, there’s no newswires or ready-made sources of text. • Some groups have attempted to record and annotate video tapes, but the difficulty of creating a useful and consistent manual transcription standard and then performing the transcription makes for very slow work. • No statistical approaches to ASL MT.
MT Pyramid Dorr 1998. Machine Translation Pyramid • Options in MT design. • No stats? higher path: • more work • domain size • subtler divergences handled
Option 1: Direct Translation • What kind of non-statistical translation possible if all we do is word-level analysis (i.e. morphology, POS & sense tagging) ? • Word-for-sign dictionary look-up system. • Probably not sophisticated enough analysis to produce ASL, but could produce SEE.
Option 2: Transfer Translation • Syntactically analyze English text before crossing over to ASL. • Capture more divergences and handle more complex phenomena. • Can successfully translate many English sentences into ASL. • Some previous work along these lines. • some use deep syntax or simple semantics
Transfer Issues for ASL • ASL Discourse Model: topics, referents in space. • Representing & Generating Non-Manual Signals. • Computational Model of ASL Phonology • facilitate creation of an ASL lexicon • define morphological and phonological operations • Parameterizing ASL Features for Morphology • Note:If system couldn’t handle a particular input, just fall back on direct translation to produce signing output closer to SEE than fluent ASL. 14
Handling Spatially Complex ASL Failings of direct and transfer approaches to ASL MT.
But what’s the hard part? • Previous ASL generation work has ignored spatially complex ASL sentences. • Classifier predicates and spatial verbs • Very common, very communicatively useful. • Difficult to handle in transfer architecture. (More going on than just syntax with these.)
Translate to a Classifier Predicate The car drove down the bumpy road past my house.POSS1s HOUSE ClassPred-C-{locate house}IXx CAR ClassPred-3-{drive on bumpy road} • Where’s the house, the road, and the car? How close? Where does the path start/stop? How show path is bumpy, winding, or hilly?
Paralinguistic? Iconic? Spatial? • Linguists debate whether classifier predicates are: • Paralinguistic visually iconic gestural movements • Complex non-spatial polymorphemic constructions • Semantically compositional yet still spatially aware • Pushing the boundaries of ‘language’… • May involve gradient information, spatial analogy, scene visualization, and a degree of iconicy. • Not clear traditional linguistic approaches can capture. • Still seems linguistic however: many constraints… 18
When the going gets tough… • …the tough try an interlingua. • Hard to address using morphological, syntactic, and simple semantic information of the English text. • Direct or transfer architecture appear insufficient. • What about an interlingual approach? • Problem: Hard to build interlingua system for unlimited (or even medium-sized) domain. Lots of overhead! • Interlingual systems only for limited domains.
Getting by with limited domain? • Special about ASL: can identify ‘hard’ sentences. • Spatially descriptive text: English spatial verbs describing locations, orientations, or movements; spatial prepositions or adverbs; concrete or animate entities; other common motifs or situations when classifier predicates are used (detect lexically). • Use broad-coverage transfer approach for most inputs, and detect when we need to use something more powerful when we have a spatially complex English input sentence.
“Multi-Path” MT? • Whenever possible,Use simpler easier-to-build MT approach. • Only when needed,Use more sophisticated resource-intensive. • We take advantage of the ‘breadth’ of one and the ‘depth’ of the other. • If we add direct translation (to SEE) to the picture, we actually have three pathways.
MT Pyramid Dorr 1998. “Pyramidal” MT? Don’t interpret this picture as a set of options anymore… Now it’s a skeleton for a multi-path MT architecture.
This sounds rather ambitious… How could the computer model spatial reality? What is our Interlingua? • What is the language-neutral representation between the English and ASL when talking about a spatially complex scene? • Intuitively, the signer has a visualization of the 3D scene which they are discussing. • So, a spatial representation of reality (or the signer’s imagination/conception of this reality) is serving as the interlingua.
What about Virtual Reality? • Analyze the English text, construct 3D virtual reality representation of the scene, and use VR as basis for generating the spatially iconic classifier predicate movements. • But has anyone ever attempted to construct a 3D virtual reality representation of a changing scene as it is described by English sentences? • Actually, the University of Pennsylvania has. 22
A Useful Technology Natural Language Command and Control of Virtual Reality Scenes
HMS & NLP Labs: 3D Scene NL-Command • Have a virtual reality model of characters and objects in a three-dimensional scene. • Accepts English text input (directions for the characters or objects to follow). • Produces an animation in which the characters obey the English commands. • Updates the 3D scene to show changes. Badler, Bindiganavale, Allbeck, Schuler, Zhao, Lee, Shin, and Palmer. 2000.Schuler. 2003.
An NL-Controlled 3D Scene http://hms.upenn.edu/software/PAR/images.html
NL Command and Control Animated 3D Scene Actionary: PAR Templatesfor Entity Motions Animation Script Hierarchical Planning: handle ambiguities, add more detail… Filled-In PAR Selecting a PAR Template from the Actionary and Filling-In Slots English Syntax Analysis English Text
What’s a PAR? NL Command and Control Animated 3D Scene Actionary: PAR Templatesfor Entity Motions Animation Script Hierarchical Planning: handle ambiguities, add more detail… Filled-In PAR “Actionary” = Action Dictionary = List of PAR Templates Selecting a PAR Template from the Actionary and Filling-In Slots English Syntax Analysis English Text
Arguments SpecifyLocomotion Adjuncts Verb Planning Operator Parameterized Action Representation participants: [ agent: AGENT objects: OBJECT list ] semantics: [ motion: {Object, Translate?, Rotate?} path: {Direction, Start, End, Distance} termination: CONDITION duration: TIME-LENGTH manner: MANNER ] start: TIME prep conditions: CONDITION boolean-exp sub-actions: sub-PARs parent action: PAR previous action: PAR next action: PAR This is a subset of PAR info. http://hms.upenn.edu/software/PAR
NL Command and Control Animated 3D Scene Actionary: PAR Templatesfor Entity Motions Animation Script Hierarchical Planning: handle ambiguities, add more detail… Filled-In PAR Selecting a PAR Template from the Actionary and Filling-In Slots English Syntax Analysis English Text
NL Command and Control Animated 3D Scene Actionary: PAR Templatesfor Entity Motions Animation Script Hierarchical Planning: handle ambiguities, add more detail… Filled-In PAR Selecting a PAR Template from the Actionary and Filling-In Slots English Syntax Analysis English Text
NL Command and Control Animated 3D Scene Actionary: PAR Templatesfor Entity Motions Animation Script Hierarchical Planning: handle ambiguities, add more detail… Filled-In PAR Selecting a PAR Template from the Actionary and Filling-In Slots English Syntax Analysis English Text
MT Approach to Classifier Predicates Using the HMS NL Command and Control Technology 25
Using this technology… http://hms.upenn.edu/software/PAR/images.html An NL-Controlled 3D Scene
Using this technology… An NL-Controlled 3D Scene
Using this technology… Original image from: Simon the Signer (Bangham et al. 2000.) An NL-Controlled 3D Scene Signing Character
Using this technology… Original image from: Simon the Signer (Bangham et al. 2000.) An NL-Controlled 3D Scene Signing Character
“Invisible World” Approach • Mini VR scene in front of the signer containing entities from English text. (They’re invisible.) • Interpret the English sentences as NL commands. Instantiate PARs which position, move, reorient, and otherwise modify the entities in this world. • Update VR model. • Use hand to show changes in the invisible scene. • VR acts as intermediary between English & ASL.
Original image:MT Pyramid Dorr 1998. Interlingual Pathway for ASL Our MT picture… We now have an interlingual pathway.
Interlingual Pathway for ASL The NL-Command Technology
Interlingual Pathway for ASL This step harder than seems…
VR Scene Doesn’t Do It All • Various factors aside from the movement of the scene itself can affect this generation choice: • conventional motifs of expression • e.g. furniture or items in a room • restrictions on use of multiple hands simultaneously • handshape-movement combination constraints • e.g. ‘approaching’ constructions • discourse or semantic concerns/priorities, etc. • There’s generation work to be done!
An NL Engineering Solution • How to create the classifier predicates from VR? • Write rules obeying restrictions that inspect the VR scene, consider English text semantics, and combine many small units/morphemes to slowly produce or narrow-in on a classifier predicate output. • Easier approach: Lexicalize classifier predicates as much as possible. Define and specify a big list of classifier predicate templates – their performance and semantics. Fill slots based on info in the VR scene. • HMS: To define set of possible movement templates, build a PAR “actionary” specifying the animation possibilities. 30
A Second Actionary: For ASL • The first actionary (list of PAR templates) we saw was used while analyzing the English text. It listed possible types of movements the imaginary entities perform in the virtual reality scene. • This second actionary would describe the possible movements of the signer’s hands while performing one or more interrelated classifier predicates(& discourse/semantic effects). Original image from: Simon the Signer (Bangham et al. 2000.)
Interlingual Pathway for ASL This step could be hard…
Interlingual Pathway for ASL We now have an architecture for the interlingual pathway!