200 likes | 320 Views
Semantic representation of events in 3D animation. Minhua Eunice Ma and Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Informatics University of Ulster , Northern Ireland. Seancha í : an Intelligent MultiMedia storyteller. Seancha í. multimodal presentation.
E N D
Semantic representation of eventsin 3D animation Minhua Eunice Ma and Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Informatics University of Ulster, Northern Ireland
Seanchaí: an Intelligent MultiMedia storyteller Seanchaí multimodal presentation CONFUCIUS (story interpretation & presentation) Homer (story generation) natural language stories user input user input (text: stories, play/movie scripts)
CONFUCIUS: story interpretation & presentation Story in natural language Storywriter /playwright Speech (dialogue) User /story listener Movie/drama script CONFUCIUS 3D animation non-speech audio Tailored menu for script input
Architecture of CONFUCIUS Natural language stories Script writer Script parser Prefabricated objects (knowledge base) lexicon grammar etc Natural Language Processing Text To Speech Sound effects Language knowledge semantic representations 3D authoring tools mapping visual knowledge Animation generation visual knowledge (3D graphic library) Synchronizing & fusion 3D world with audio in VRML
Semantic Representation Languages Sentence level semantics • FOPC (First Order Predicate Calculus) • Semantic networks • Conceptual Dependency (CD) (Schank 1973) • Primitives and scripts • Frame-based representations (Minsky 1975) Verb Semantics • event-logic truth conditions (Siskind 1995) • x-schemas with f-structures (Bailey et al. 1997)
MultiModal semantic representation Multimodal semantics High-level multimodal semantic representation: XML-based/frame-based Media-independent representation Visual media-dependent representation Intermediate level Audio media-dependent representation Non-speech audio modality Visual modality Language modality
Knowledge base of CONFUCIUS knowledge base Semantic knowledge - lexicons (eg. WordNet) Syntactic knowledge - grammars Statistical models of language Associations between words Language knowledge Object model (nouns) Functional information Internal coordinate axes (for spatial reasoning) Associations between objects Event model (event verbs, describes the motion of objects) Visual knowledge World knowledge Spatial & qualitative reasoning knowledge
Categories of events • Atomic entities • Change physical location such as position and orientation, e.g. “bounce”, “turn” • Change intrinsic attributes such as shape, size, color, and texture, e.g. “bend”, and even visibility, e.g. “disappear”, “fade” (in/out) • Non-atomic entities • Non-character events • Two or more individual objects fuse together, e.g. “melt” (in) • One object divides into two or more individual parts, e.g. “break” (into pieces) • Change sub-components (their position, size, color), e.g. “blossom” • Environment events (weather verbs), e.g. “snow”, “rain” • Character events • Action verbs • Intransitive verbs • Transitive verbs • Non-action verbs (stative, emotion, possession, mental activities, cognition & perception) • Idioms & metaphor verbs
involve speech modality Categories of action verbs • Intransitive verbs • Biped kinematics, e.g. “walk”, “swim”, & other motion models like “fly” • Face expressions, e.g. “laugh”, “anger” • Lip movement, e.g. “speak”, “say” • Transitive verbs • single object, e.g. “throw”, “push”, “kick” • multiple objects • direct and indirect objects, e.g. “give”, “pass”, “show” • indirect object & the tool used to perform the action, e.g. “cut”, “hammer”
Basic predicate-arguments 1) move(obj, xInc, yInc, zInc) 2)moveTo(obj, loc) 3) moveToward(obj,loc,displacement) 4) rotate(obj,xAngle,yAngle,zAngle) 5)faceTo(obj1, obj2) 6)alignMiddle(obj1, obj2, axis) 7)alignMax(obj1, obj2, axis) 8)alignMin(obj1, obj2, axis) 9)alignTouch(obj1, obj2, axis) 10) touch(obj1, obj2, axis) 11) scale(obj, rate) 12) squash(obj, rate, axis) 13) group(x, [y|_], newObj) 14) ungroup(xyList, x, yList)
Hierarchical structure of predicates 3rd level 2nd level Atomic level touch() moveToward(), alignMiddle(),alignTouch(), alignMax(), alignMin(), faceTo() move(), moveTo(), rotate(), scale(), squash()
y Front view Top view x y z z x obj2 obj2 before obj1 obj1 obj2 obj2 touch(obj1, obj2, x):- alignMiddle(obj1,obj2,y), alignMiddle(obj1,obj2,z), alignTouch(obj1,obj2,x). obj1 obj1 after obj2 obj2 touch(obj1, obj2, y):- alignMiddle(obj1,obj2,z), alignMiddle(obj1,obj2,x), alignTouch(obj1,obj2,y). obj1 obj2 is on the top obj1 obj2 obj2 obj1 touch(obj1, obj2, z):- alignMiddle(obj1,obj2,x), alignMiddle(obj1,obj2,y), alignTouch(obj1,obj2,z). obj1 is in the front obj1
Decomposite predicate-argument model-- an example: “call” First Level call(a):- type(a, Person), type(tel, Telephone), pickup(a, tel.receiver,a.leftEar), dial(a, tel.keypad), speak(a, tel.receiver), putdown(a, tel.receiver, tel.set). Second Level pickup(x,obj,dest):- type(x, Person), moveToward(x.leftHand,location(obj),location(obj)-location(x)-5), touch(x.leftHand, obj, axis), group(x.leftHand, obj, xHandObj), moveToward(xHandObj, dest, _). putdown(x, obj, dest):- moveTo(x.leftHand, dest), ungroup(x, obj, x1), type(x1, Person).
one many many many Visual definition & word sense polysemy verb word sense visual definition entry mapping synonymy • a normal door (rotation on y axis) • a sliding door (moving on x axis) • a rolling shutter door (a combination of rotation on x axis and moving on y axis) Example: “close” (a door) word sense -- minimal complete unit of meaning in the language modality visual definition entry -- minimal complete unit of meaning in the visual modality
Troponyms & verbs derived from adjectives/nouns • troponym • elaborates the manners of a base verb (Fellbaum 1998) • examples: “trot”-“walk” (fast), “gulp”-“eat” (quickly) • base verb + adverb present the base verb + modify the manner (speed, the agent’s state, duration of the activity, iteration, etc.) • Verbs derived from adjectives or nouns • change objects’ properties (size, color, shape) or the world state • verbs with affixes such as –en, -ify, or –ize, e.g. “lengthen” • using predicates scale(), squash() or changing the corresponding property fields of the object in VRML
Representing active & passive voice • active and passive voice • converse verb pairs such as “give/take”, “buy/sell”, “lend/borrow” • same activity from different point of view • use of VRML Viewpoint node
Implementation: semanticsVRML DEF ball Transform { translation 0 0 0 children [ DEF ball-TIMER TimeSensor { loop TRUE cycleInterval 0.5 }, DEF ball-POS-INTERP PositionInterpolator { key [0, 0.5, 1 ] keyValue [0 0 0, 0 20 0, 0 0 0 ] }, Shape { appearance Appearance { material Material {} } geometry Sphere { radius 5 } }] ROUTE ball-TIMER.fraction_changed TO ball-POS-INTERP.set_fraction ROUTE ball-POS-INTERP.value_changed TO ball.set_translation } (c) Output VRML code of a bouncing ball Example: “A ball is bouncing” bounce(obj):- move(obj, 0, 20, 0), move(obj, 0, -20, 0). (a) visual definition of “bounce” DEF ball Transform { translation 0 0 0 children [ Shape { appearance Appearance{ material Material{} } geometry Sphere { radius 5 } } ] } (b) VRML code of a static ball
Relation to previous work • Semantic decomposition • previous decomposite methologies (e.g. Schank’s CD analysis) • basic predicates “move”, “go”, “change” • pros and cons • generative and interpretative facilities (Jackendoff, 1972) • inadequate to capture the creative aspect of meaning • comparison • aimed at presentation purposes for visual modalities • no emphasis on atomic predicates
high level low level
Conclusion & future work • Conclusion • formalizes meaning of action verbs • implement in Java & VRML • reusable in other systems • Future work • inadequate • vagueness problem in language visualisation (underspecification) • temporal relations between sub-activities • representing non-action verbs & adjectives • using other modalities (e.g. speech/audio) to aid event representation