480 likes | 715 Views
American Sign Language Natural Language Generation and Machine Translation Systems. Written Preliminary Examination II Computer and Information Science University of Pennsylvania September 11, 2003. Matt Huenerfauth. Committee: Norm Badler, Chair Martha Palmer Mitch Marcus. Overview.
E N D
American Sign Language Natural Language Generation and Machine Translation Systems Written Preliminary Examination IIComputer and Information ScienceUniversity of PennsylvaniaSeptember 11, 2003 Matt Huenerfauth Committee:Norm Badler, ChairMartha PalmerMitch Marcus
Overview • Introduction • Motivations and Applications • American Sign Language Linguistics • English-to-ASL Problem • The Four Systems • Greatest Strength of Each System • Point-by-Point Comparison • Conclusions and Future Directions
Motivations and Applications • English and ASL are very different languages, but many approaches to helping the deaf access the hearing world forget that English is their second language. • Half of deaf high school graduates read English below fourth-grade level, but the vast majority have sophisticated fluency in American sign language. • Applications: • TV captioning, teletype telephones. • Human interpreters intrusive/expensive. • Educational tools, access to information. • Storage and transmission of ASL. Holt 1991.
ASL Linguistics I • What is ASL? • Real language? Who uses it? • Different than SEE or SSE. • How is it different than English? • Grammar, Vocabulary, Visual/Spatial. • More than the Hands: Simultaneity! • How signs can be changed: Morphology! • Use of Space around the Signer…
ASL Linguistics II • Discourse Space • Put things on “shelves” for later use. • Spatial positioning of entities in the discourse (people or things you are talking about). • “Agreement” - Pronouns, Possessives, Standard Verbs, Special Agreeing Verbs. • Three-Dimensional Space • Pretend there’s a little 3d scene in front of you. • Some verbs incorporate a 3d path of motion. • Classifier Predicates: describe 3-D scenes.
ASL Linguistics III • Traditional Sentences: • ASL without 3-D classifier predicates. Where does Bob attend college? wh #BOB IXx GO-TO UNIVERSITY WHERE • Spatially Complex Sentences: • English into ASL using classifier predicates. I parked my car next to his truck.POSSx TRUCK ClassPred-3-{parking the truck}POSS1s CAR ClassPred-3-{park next to truck}
ASL Lacks a Writing System ____ndiHELPPRO.2 PRO.2 MAN Ixi • Glosses: • Glosses used by linguists, not by signers.All notations omit some details of original. • Without writing system, generating ASL is analogous to semantics-to-speech for a traditional language. No boundary. • To compensate, each system in survey develops their own writing formalism.
Animated Virtual Humans • Virtual Model of the Human Form • Can be Articulated to Produce ASL • An ASL Generator produces instructions for the avatar, and the avatar performs the signs -- producing an animated output for the user to view. • Our problem is how to build instructions.
Virtual Signing Humans Photos: Seamless Solutions, Inc.Simon the Signer (Bangham et al. 2000.)Vcom3D
Zhao et al. 2000 TEAM System
TEAM System • At University of Pennsylvania: • Xtag English Grammar • Synchronous TAG translation • HMS lab human modeling technology • Parallel Transition Network animation specification facilitates sign flexibility & blending. • Uses EMOTE animation “manner” parameterization approach to elegantly encode ASL adverbials, aspect, and morphology.
TEAM: Motion Parameterization • ASL morphology, aspect, and adverbials are often expressed via subtle modifications to the performance of signs. • Specifying these changes can be laborious. • HMS already had “motion manner” specification approach based on Laban Movement Analysis. • One or two input values controls the manner of character’s movement. • Defining ASL operators in terms of these made some phenomena easier to specify and implement.
Marshall & Safar 2002. ViSiCAST System
ViSiCAST System • University of East Anglia. • European Union Project. • English text to British Sign Language. • Uses CMU Link Parser and Head-Driven Phrase Structure Rules for Generation. • Uses an expressive XML-compatible sign-language-specific animation control language.
ViSiCAST: DRS • Discourse Representation Structure: • Lists all entities in the discourse. • Stores simple propositions for semantics. • Great for ASL: • Can track all entities placed in space around signer. • Foundation for reference resolution algorithms. Very important for English-to-ASL!!! • Propositions de-aggregate information, remove English syntax bias, and isolates: tense, aspect, and other modifiers expressed at varied levels of ASL.
Veale et al. 2001. Zardoz System
Zardoz System • Trinity College Dublin. • Ambitious proposal, some implemented. • English to many sign languages (ISL,BSL,ASL). • Hand-coded event schemata as interlingua. • Spatial, commonsense, and metaphorical reasoning. • AI Focus: • Metaphorical Reasoning • Knowledge Representation • Blackboard System Architecture
Zardoz: Reasoning for ASL • System designed to facilitate AI reasoning for visual spatial analysis, idiom decomposition, creation of new signs by metaphor. • Designers asserted that some complex ASL constructions would require a system that models and reasons about spatial relationships. • Unfortunately, the schema-based approach they propose is very time-consuming to implement, and the reasoning requires sophisticated AI.
Speers 2001. ASL Workbench System
ASL Workbench • Georgetown University, Linguistics. • Linguistics/Representation Focus: • Never implemented animation. • Uses LFG rules for analysis, f-structure transfer rules. • Uses modern Movement-Hold model of ASL phonology at basis of lexical representation.
Speers 2001. Workbench: ASL Phonology • Movement-Hold Model. • ASL defined as time slices when hands move or pause. We specify the details of each hand for each time-slice. • Non-manual signals not captured very well.
Workbench: ASL Phonology • Highly expressive: • Captures phenomena in modern linguistic studies. • But makes lexicon building time-consuming. • Some details not known until generation-time. • Morphology and phonology rules for ASL can be intuitively defined when you use this representation system. They seem to operate on MH time segments.
Development Status Best! ViSiCAST: Broadest linguistic coverage and only system still under development. Workbench: ASL grammar intermediate, but few lexicon entries or transfer rules. TEAM: Demo grammar and lexicon. Zardoz: Limited by schema development. Minimal grammar and lexicon. System minimally implemented. Worst!
MT Pyramid Dorr 1998. Underlying MT Architecture (1)
MT Pyramid Dorr 1998. Underlying MT Architecture (1) TEAM
MT Pyramid Dorr 1998. Underlying MT Architecture (1) Workbench TEAM
MT Pyramid Dorr 1998. Underlying MT Architecture (1) ViSiCAST Workbench TEAM
MT Pyramid Dorr 1998. Underlying MT Architecture (1) Zardoz ViSiCAST Workbench TEAM
Underlying MT Architecture (2) Direct: No Systems in this Survey. Word for word. Sign dictionary lookup. • As you go higher up the pyramid: • Development work increases. • Subtlety of divergences you can handle increase. Bad! Good!
Underlying MT Architecture (3) • Generalizations particularly true in non-statistical systems. In non-statistical systems, we know better the sources of information: the linguistic artifacts available. • We know the limits of what the system knows. • In statistical system, translation corpora data could capture whatever information human translator used. This data guides the system. • No ASL corpora no statistical systems.
ASL Generation Formalism (1) ViSiCAST: HPSG • Operates on multi-level feature structures. Workbench: LFG • Generator: Word order and lexical choice before NMS, morphology, or phonology. • Both systems Phrase Structure based. Good! Bad! Bad!
ASL Generation Formalism (2) • ASL uses 2 dimensions: time and space to differentiate the roles of lexical units. • So word order tends to be more flexible. • But PS best for word-ordering intensive generation. • Tends to narrow on particular ordering early. • Word order flexible; should 1st focus on other decisions and constraints during generation. • Also, ASL ‘word’ unit is hard to define.
ASL Generation Formalism (2) TEAM: Synchronous Tree Adjoining Grammar • Determines ASL surface tree from English one. • Discourse model and deeper analysis should determine reference choice, topicalization, tagging, and most NMS. Zardoz: Spatial Dependency Graphs • Represents ASL signs and NMS-boundary tokens in a partial ordering graph structure. Can change or add to the constraints. • All orderings are “soft” - chose optimal linearization. • Takes advantage of ASL word order flexibility. Bad! Good!
NMS Expressiveness TEAM: NMS tokens: begin-furrow … end-furrow Zardoz: NMS tokens: begin-furrow … resume Workbench: Calculates NMS from c-structure result. NMS merely complements the syntax. ViSiCAST: NMS not implemented. Bad! Worst!
Sign Lexicon Specification TEAM: Parameterized motion templates. Specify path w/ “goal” & “via” points. Good! Zardoz: Doll Control Language. Lexical specs stored hierarchically. Good! ViSiCAST: Signing Gesture Markup Lang. Sign language specific – well-suited. BEST! Workbench: MH segments are final output. Very ASL specific, but Good! may be hard to animate. Bad!
Classifier Predicates & Space ViSiCAST ignores classifiers. TEAM does as well. Workbench explained how classifier signs fit into MH Model and characterized some. Zardoz is only system to address how to generate them using complex spatial reasoning on information stored in the translation schemata. But not a practical approach. Hand-coding individual schema. Bad! Ok. Good!
User Intervention TEAM & Zardoz do not allow intervention. ViSiCAST allows manual intervention during MT to fix errors before propagation. Workbench requires intervention in order to operate. Does not attempt any reference resolution. Ok. Ok. Bad!
Conclusions • The symbolic notation for ASL can subtly limit the potential expressiveness of the animation output if it is insufficiently detailed. • The notation should make it easy to parameterize modulations to the standard sign performance to express adverbials and morphological operations. • ASL will require a discourse representation that has been designed with the needs of a language which unambiguously and spatially refers to entities in the discourse.
Conclusions • Scene representation and spatial reasoning will be required to generate classifier predicates, directional verbs, and other complex uses of the signing space. • The generation grammar formalism should allow simultaneous access to multiple levels of ASL expression and should take advantage of ASL’s word order flexibility. • Limited ASL corpora means that the non-statistical MT systems face the development effort vs. divergence handling tradeoff acutely.
Future Directions • Only the ViSiCAST project still running. • Many projects are developing potentially useful component technologies for ASL MT: • Sign Lexicons, Signing Motion Capture, Sophisticated Human Hand Models, and Annotated Sign Corpora • Some new commercial English-to-SEE systems. (Vcom3D, iCommunicator) • Beginnings of an ASL research project here at the University of Pennsylvania.
Selected References (1) • N. Badler, R. Bindiganavale, J. Allbeck, W. Schuler, L. Zhao, S. Lee, H. Shin, and M. Palmer. 2000. Parameterized Action Representation and Natural Language Instructions for Dynamic Behavior Modification of Embodied Agents. AAAI Spring Symposium. ftp://ftp.cis.upenn.edu/pub/graphics/rama/papers/aaai.pdf • J. A. Bangham, S. J. Cox, R. Elliot, J. R. W. Glauert, I. Marshall, S. Rankov, and M. Wells. 2000. “Virtual signing: Capture, animation, storage and transmission - An overview of the ViSiCAST project.” IEEE Seminar on “Speech and language processing for disabled and elderly people.” • B. Dorr, P. Jordan, and J. Benoit. 1998. “A Survey of Current Paradigms in Machine Translation.” http://citeseer.nj.nec.com/555445.html • J. Holt. 1991. Demographic, Stanford Achievement Test - 8th Edition for Deaf and Hard of Hearing Students: Reading Comprehension Subgroup Results.
Selected References (2) • iCommunicator 4.0 Website. 2003. http://www.myicommunicator.com/ • S. Liddell and R. Johnson. “American Sign Language: The Phonological Base,” Sign Language Studies, 64, pages 195-277, 1989. In C. Valli & C. Lucas, 2000, Linguistics of American Sign Language, 3rd edition, Washington, DC: Gallaudet University Press. • C. Neidle, J. Kegl, D. MacLaughlin, B. Bahan, and R. G. Lee. 2000. The Syntax of American Sign Language: Functional Categories and Hierarchical Structure. Cambridge, MA: The MIT Press. • É. Sáfár and I. Marshall. 2002. “Sign language translation via DRT and HPSG.” In A. Gelbukh (Ed.) Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics, CICLing, Mexico, Lecture Notes in Computer Science 2276, pages 58-68, Springer Verlag, Mexico.
Selected References (3) • d'A.L. Speers. 2001. Representation of American Sign Language for Machine Translation. PhD Dissertation, Department of Linguistics, Georgetown University. • VCom3D. SigningAvatar Frequently Asked Questions.(2000) http://www.signingavatar.com/faq/faq.html • T. Veale, A. Conway, B. Collins. 1998. “The challenges of cross-modal translation: English to sign language translation in the ZARDOZ system” in Machine Translation 13. 81-106. • L. Zhao, K. Kipper, W. Schuler, C. Vogler, N. Badler, and M. Palmer. 2000. “A Machine Translation System from English to American Sign Language.” Association for Machine Translation in the Americas. • See Written WPE2 Report for full references.