240 likes | 473 Views
E-HowNet- a Lexical Knowledge Representation System. Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAP Research Fellow Research Center for Information Technology Innovation & Institute of Information Science, Academia Sinica. Outline.
E N D
E-HowNet- a Lexical Knowledge Representation System Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAP Research Fellow Research Center for Information Technology Innovation & Institute of Information Science, Academia Sinica
Outline • What is E-HowNet? • E-HowNet- Sense Representation • Major Features • Current status of E-HowNet • Automatic Construction of Ontology • Apply the Framework to Metadata Representation of Digital Collections • Conclusion and Future Work
What is E-HowNet? • E-HowNet is an entity-relation model for lexical semantic representation extended from HowNet. • The design of E-HowNet is for the purpose of automatic semantic composition and decomposition.
E-HowNet- Sense Representation • Word sense definition- decompose a sense into simpler senses and sense relations • 果盤 fruit plate def:{plate|盤:telic={put|放置: location={~},patient={fruit|水果}}} • 玻璃盤 glass plate def: {plate|盤:material={glass|玻璃}} • 圓盤 round plate def: {plate|盤:shape={round|圓}}
Principles for sense definitions • Use hypernym and prominent properties to define concepts. • Qualia structure- agentive, telic, formal, and constitutive • Use well-defined/primitive concepts and relations to define new concepts.
Telic • 狗食 dog food • def: {食物: telic={餵:target={狗},patient={~}}} • def: { food|食品: telic={feed|餵: target={livestock|牲畜:telic={TakeCare|照料:patient={family|家庭},agent={~}}}, patient={~}}}
Agentive • 早產兒premature baby • def: {嬰兒:agentive={早產:patient={~}}} • def: {human|人:age={child|少兒}, agentive={labour|臨產:manner={early|早}, patient={~}}}
Formal • 彩霞rosy clouds • def: {CloudMist|雲霧:color={colored|彩}} • 酸辣湯spicy and sour soup • def: {湯:taste={酸}.and.{辣}} • def: {food|食品:material={StateLiquid|液態},taste={sour|酸}.and.{peppery|辣}
Constitutive • 草裙grass skirt • def: {裙:material={草}} • def: {clothing|衣物:telic={PutOn|穿戴: instrument={~},location={leg|腿: whole={human|人:gender={female|女}}}}, material={FlowerGrass|花草}}
Major Features • Lexical senses are expressed by either primitive concepts (sememes) or basic concepts. • Semantic relations are explicitly expressed in E-HowNet representations. • A uniform representation for function words, content words and phrases. • Taxonomy for both entities and relations. • Semantic composition and decomposition capabilities.
Uniform representation and compositional semantics • Preposition: 把|ba def: goal={} • Noun: 文章|article def: {text|語文} • Verb: 寫好|have written • def: {write|寫:aspect={Vachieve|達成}} • Phrase: 把文章寫好|The article have been written. • {write|寫:goal={text|語文}, aspect={Vachieve|達成}}
Taxonomy of E-HowNet • http://ehownet.iis.sinica.edu.tw • All|全 • entity|事物 • event|事件 • state|狀態 • Act|行動 • AttributeValue|屬性值 • object|物體 • thing|萬物 • time|時間 • space|空間 • relation|關係 • Semantic Role|語意角色 • function|函數
Current status of E-HowNet • Coarse-grained E-HowNet sense representations for about 95,000 word-sense entries of CKIP Chinese dictionary. • About 45,000 different sense expressions • About 2,600 semantic primitives (sememes 義原) • About 200 semantic roles for objects • About 70 semantic roles for events • An automatic constructed ontology by appending and structuralizing all word senses to the HowNet top-level ontology.
Automatic construction of ontology • Starting from the top-level ontology (modified from HowNet ontology) creates lower-level ontology by subsumption relations of E-HowNet expressions. • Attach lexical senses: Words and associated sense expressions are first attached to the top-level ontology nodes according to their head concepts. • Sub-categorization by attribute-values: Lexical concepts with the same semantic head are further sub-categorized (creates a new node) according to their attribute-values. • Repeat sub-categorization step: If there are many lexical concepts in one node with same extended feature values.
Examples: • 衣衫, {clothing|衣物} • 木屐, {clothing|衣物:location={foot|腳},material={wood|木}} • 木鞋, {clothing|衣物:location={foot|腳},material={wood|木}} • 球鞋, {clothing|衣物:location={foot|腳},while={exercise|鍛鍊}} • 溜冰鞋, {clothing|衣物:location={foot|腳},while={slide|滑:location={ice|冰},purpose={exercise|鍛鍊:domain={sport|體育}}}} • 靴子, {clothing|衣物:location={foot|腳},length={LengthLong|長}} • 運動褲, {clothing|衣物:location={leg|腿},while={exercise|鍛鍊}} • 褲子, {clothing|衣物:location={leg|腿}} • 內衣, {clothing|衣物:qualification={private|私}} • 禮服, {clothing|衣物:qualification={formal|正式}} • 白紗, {clothing|衣物:qualification={formal|正式},owner={human|人:gender={female|女},predication={GetMarried|結婚:agent={~}}}} • 婚紗, {clothing|衣物:qualification={formal|正式},owner={human|人:gender={female|女},predication={GetMarried|結婚:agent={~}}}}
Attach all lexical senses: • {clothing|衣物} [衣衫, 木屐, 木鞋, 球鞋, 溜冰鞋, 靴子, 運動褲, 褲子, 內衣, 禮服, 白紗, 婚紗]
Sub–categorization by attribute-values: • {clothing|衣物} [衣衫] • 鞋子|shoes [木屐, 木鞋,球鞋, 溜冰鞋, 靴子] • 褲子|trousers [褲子, 運動褲] • 內衣|underwear [內衣] • 禮服|ceremonial robe/dress [禮服, 白紗,婚紗]
Repeat sub-categorization step: • {clothing|衣物} [衣衫] • 鞋子|shoes [球鞋, 溜冰鞋, 靴子] • {木屐} [木屐,木鞋] • 褲子|trousers [褲子, 運動褲] • 內衣|underwear [內衣] • 禮服|ceremonial robe/dress [禮服] • {白紗} [白紗,婚紗]
Apply the Framework to Metadata Representation of Digital Collections • 奉華紙槌瓶={瓷瓶:Time={北宋},Type={汝窯}} • 瓷瓶={瓶子:material={瓷} • 奉華紙槌瓶={瓶子: material={瓷}, Time={北宋},Type={汝窯}}
Apply the Framework to Metadata Representation of Digital Collections • 青瓷水仙盆={瓷盆:Time={北宋},Type={汝窯}, Telic={水仙}} • 瓷盆={盆:material={瓷}} • 青瓷水仙盆={盆:material={瓷} , Time={北宋}, Type={汝窯} }, Telic={水仙}}
Apply the Framework to Metadata Representation of Digital Collections • 奉華紙槌瓶={瓶子: material={瓷},Time={北宋},Type={汝窯}} • 青瓷水仙盆={盆:material={瓷} ,Time={北宋}, Type={汝窯} }, Telic={水仙}}
Conclusion and Future Works • E-HowNet sense representations are updated from time to time. • The ontology can be rebuilt automatically based on the refined expressions. • New categories in the taxonomy can be identified and characterized by their specific attribute-values. • Uniform representations of function words and content words facilitate semantic composition and decomposition. • Because of E-HowNet’s semantic decomposition capability, the primitive representations for surface sentences with the same deep semantics are nearly canonical.