420 likes | 661 Views
HowNet and Computation of Meaning. Zhendong Dong dzd@keenage.com WWW.keenage.com GWC-06 Jeju, Korea 2006-01-22. Outlines. Bird’s-eye view of HowNet Prominent features. Bird’s-eye view of HowNet. What is HowNet? History of HowNet Statistics on latest version
E N D
HowNet and Computation of Meaning Zhendong Dong dzd@keenage.com WWW.keenage.com GWC-06 Jeju, Korea 2006-01-22
Outlines • Bird’s-eye view of HowNet • Prominent features
Bird’s-eye view of HowNet • What is HowNet? • History of HowNet • Statistics on latest version • Composition of HowNet
What is HowNet? • HowNet is an on-line extralinguistic knowledge system for the computation of meaning in HLT. • HowNet unveils inter-concept relations and inter-attribute relations of the concepts as connoted in its Chinese-English lexicon.
History of HowNet 1988 Basic research started 1999 1st version released 2000 Revision of KDML started 2002 New version released
Statistics - general Chinese word & expression 84102 English word & expression 80250 Chinese meaning 98530 English meaning 100071 Definition 25295 Record 161743
A record in HowNet dictionary NO.=076856 W_C=买主 G_C=N [mai3 zhu3] E_C= W_E=buyer G_E=N E_E= DEF={human|人:domain={commerce|商业},{buy|买: agent={~}}}
Statistics - semantic Chinese English Thing 58153 58096 Component 7025 7023 Time 2238 2244 Space 1071 1071 Attribute 3776 4045 Atttibute-value 9089 8478 Event 12634 10076
Statistics – main syntactic categories ChineseEnglish ADJ 11705 9576 ADV 1516 2084 VERB 25929 21017 NOUN 46867 48342 PRON 112 71 NUM 225 242 PREP 128 113 AUX 77 49 CLA 424 0
Statistics – part of relations Chinese synset: Set = 13463 Word Form = 54312 antonym: Set = 12777 converse: Set = 6753 English synset: Set = 18575 Word Form = 58488 antonym: Set = 12032 converse: Set = 6442
Composition • Database • Tools for computation of meaning
Database • Dictionary • Taxonomies • Axiomatic relations & role shifting
Taxonomies - 10 • Entity • Event • Attribute • AttributeValue • Secondary features • Event roles • Typical actors of event roles • Event relations and role shifting • Antonymous sememe pairs • Converse sememe pairs
Tools for computation of meaning • Browser • Secondary resources
Prominent features • All syntactic classes of words included • Sememes and semantic roles • Defining concepts in KDML on the basis of sememes and semantic roles • Relations – the soul of HowNet • Relations obtained by computing rather than manually-coding • Identical representation in various linguistic structures
Sememes Sememes 2099 Entity 151 thing (physical, mental, fact) component (part, fitting) time space (direction, location) Event (relation, state; action) 812 Attribute 247 AttributeValue 889 Secondary feature121
Semantic roles 91 (1) Main semantic roles (a) principal semantic roles: 6 (b) affected semantic roles: 11 (2) peripheral semantic roles (a) time: 12 (f) basis: 6 (b) space: 11 (g) comparison: 2 (c) resultant: 8 (h) coordination: 6 (d) manner: 11 (i) commentary: 2 (e) modifier: 16
Defining concepts (1) W_E=doctor G_E=V DEF={doctor|医治} W_E=doctor G_E=N DEF={human|人:HostOf={Occupation|职位},domain={medical|医}, {doctor|医治:agent={~}}} W_E=doctor G_E=N E_E= DEF={human|人:{own|有:possession={Status|身分: domain={education|教育},modifier={HighRank|高等: degree={most|最}}},possessor={~}}}
Defining concepts (2) W_E=buy G_E=V DEF={buy|买} cf. (WordNet) obtain by purchase; acquire by means of finacial transaction W_E=buy G_E=V DEF={GiveAsGift|赠:manner={guilty|有罪}, purpose={entice|勾引}} cf. (WordNet) make illegal payments to in exchange for favors or influence
Relations – the soul of HowNet • Meaning is represented by relations • Computation of meaning is based on relations
1. Event Frame ~ Verb frame - {event|事件} ├ {static|静态} {event|事件} │ ├ {relation|关系} {static|静态} │ │ ├ {possession|领属关系} {relation|关系} │ │ │ ├ {own|有} {possession|领属关系:possessor={*},possession={*}} │ │ │ │ ├ {obtain|得到} {own|有:possessor={*},possession={*},source={*}} └ {act|行动} {event|事件:agent={*}} ├ {ActGeneral|泛动} {act|行动:agent={*}} └ {ActSpecific|实动} {act|行动:agent={*}} └ {AlterSpecific|实变} {ActSpecific|实动:agent={*}} ├ {AlterRelation|变关系} {AlterSpecific|实变:agent={*}} │ ├ {AlterPossession|变领属} {AlterRelation|变关系:agent={*},possession={*}} │ │ ├ {take|取}{AlterPossession|变领属:agent={*},possession={*},source={*}} │ │ │ ├ {buy|买} {take|取:agent={*}, possession={*}, source={*}, cost={*}, beneficiary={*}
2. Typical actors of event roles ~ VerbNet │ ├ {buy|买} {take|取:agent={human|人}{group|群体->}, possession={artifact|人工物->}, source={human|人}{InstitutePlace|场所}, cost={money|货币}, beneficiary={human|人}{group|群体->}, domain={economy|经济}}
Axiomatic Relations & Role Shifting - 1 {buy|买} <----> {obtain|得到} [consequence]; agent OF {buy|买}=possessor OF {obtain|得到}; possession OF {buy|买}=possession OF {obtain|得到}. {buy|买} <----> {obtain|得到} [consequence]; beneficiary OF {buy|买}=possessor OF {obtain|得到}; possession OF {buy|买}=possession OF {obtain|得到}. {buy|买} <----> {obtain|得到} [consequence]; source OF {buy|买}=source OF {obtain|得到}; possession OF {buy|买}=possession OF {obtain|得到}.
Axiomatic Relations & Role Shifting - 2 {buy|买} [entailment] <----> {choose|选择}; agent OF {buy|买}=agent OF {choose|选择}; possession OF {buy|买}=content OF {choose|选择}; source OF {buy|买}=location OF {choose|选择}. {buy|买} [entailment] <----> {pay|付}; agent OF {buy|买}=agent OF {pay|付}; cost OF {buy|买}=possession OF {pay|付}; source OF {buy|买}=taget OF {pay|付}.
Axiomatic Relations & Role Shifting - 3 {buy|买} (X) <----> {sell|卖} (Y) [mutual implication]; agent OF {buy|买}=target OF {sell|卖}; source OF {buy|买}=agent OF {sell|卖}; possession OF {buy|买}=possession OF {sell|卖}; cost OF {buy|买}=cost OF {sell|卖}.
Identical representation - 1 W_E=smuggle G_E=V DEF={transport|运送:manner={guilty|有罪}} W_E=drug G_E=N DEF={addictive|嗜好物:modifier={guilty|有罪}}
Identical representation - 2 W_E=smuggling of drugs G_E=N DEF={fact|事情:CoEvent={transport|运送: manner={guilty|有罪},patient={addictive|嗜好物: modifier={guilty|有罪}}}} W_E=drug smuggler G_E=N DEF={community|团体:{transport|运送:agent={~}, manner={unlawful|非法},patient={addictive|嗜好物}, purpose={sell|卖}}}
Motivation to develop secondary resources • To check from different angles HowNet knowledge data for their preciseness and consistency • To provide users with tools for application • Practible for any sense of any word
Secondary resources • Concept Relevance Calculator (CRC) • Concept Similarity Measure (CSM) • Query Expansion Tool (QET) • Chinese Morphological Processor (CMP) • Chinese Message Analyzer (CMA)
Concept similarity doctor 2 <> dentist 0.300000 doctor 1<> dentist 0.883333 doctor 1<> nurse1 0.620000 doctor 1<> nurse2 0.454545 doctor 1<> patient 0.203636 walk <> run 0.144444 walk <> jump 0.144444 walk <> swim 0.130159 walk <> fly 0.124444 walk <> buy 0.018605
Conclusion • Extralinguistic knowledge is indispensable for HLT • The knowledge should be a system which is computer-oriented • It should be big enough, exemplary toy is useless • It can conduct computation of meaning
Thank youWelcome towww.keenage.com!Download and try Mini-HowNet