1 / 35

静态的表达 与 动态的激活

静态的表达 与 动态的激活. 董振东 dzd@keenage.com WWW.keenage.com 清华 2007-12. 提纲. 开场白 -- 知网不是什么? 知网系统的概貌 知网的创新点 结语. 开场白 -- 知网不是什么?(1).

chidi
Download Presentation

静态的表达 与 动态的激活

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 静态的表达 与 动态的激活 董振东 dzd@keenage.com WWW.keenage.com 清华 2007-12

  2. 提纲 • 开场白 -- 知网不是什么? • 知网系统的概貌 • 知网的创新点 • 结语

  3. 开场白 -- 知网不是什么?(1) 在中文方面,也已有了一个类似词汇网路的资源,叫做知网》(HowNet, http://www.keenage.com)。《知网》做法的特色是独树一帜;不采用英文词汇网路的架构只要采取他自己的架构。而且他先把世界知识本体做个定义,在这定义里再去做区分。这个由上而下的方法,与英语与欧语词汇网路由下而上的方法不同,当然有其可取之处。可惜的是,由于当年资源与讯息的限制,并未与世界相关的研究接轨。基本上跟其他语言的词汇网路连接,并无架构上的基础,而其上层知识分类,也是两人的自由心证,不能说错,却也缺乏理论的基础,面临一些其他系统互通性(inter-operability)的问题。

  4. 开场白 --知网不是什么?(2) 近年他在另外的场合又说: “HowNet is a database/network of semantic relationships among Chinese words. Conceptually it’s similar to WordNet of English, but the author claims they differ substantially. For one thing, HowNet is NOT free. Well, they are making words A-D free for download, as a teaser.”

  5. 开场白 --知网不是什么?(3) • 知网不是语义词典、义类词典、概念词典、英汉双语词典 – 知网不是词典 • 知网不是汉化的WordNet、不是WordNet的中文代用品 • 知网不是语言学研究的产物

  6. 知网系统的概貌 • 数据统计 • 系统组成

  7. 数据统计 Chinese character 7152 Chinese word & expression 92159 English word & expression 86141 Chinese meaning 106591 English meaning 106731 Definition 27877 Record 172097

  8. 知网的创新点 • 理论创新 • 知识获取和表达创新 • 知网的知识力量

  9. 理论创新 • 知识论 • 事件类概念间关系 – 双轴论

  10. 知识获取和表达创新 • 义原的获取和选择 • 义原的组织和分类体系的建构 • 用结构化语言(KDML)来定义概念,定义由两种语言词语表示的概念

  11. 义原的获取和选择 Sememes 2090 Entity 150 thing (physical, mental, fact) component (part, fitting) time space (direction, location) Event (relation, state; action) 810 Attribute 245 AttributeValue 885 Secondary feature 121

  12. 义原的组织和分类体系的建构 实体 Entity 事件 Event 属性 Attribute 属性值 AttributeValue 次要特征 Secondary features 事件角色 Event roles 事件角色的典型演员 Typical actors of event roles 公理关系与角色转换 Axiomatic relations and role shifting 反义义原对 Antonymous sememe pairs 对义义原对 Converse sememe pairs

  13. 知网中概念的定义 (1) Concept definitions in HowNet – “buy” 1. {GiveAsGift|赠:manner={guilty|有罪}, purpose={entice|勾引}} 2. {buy|买} Cf. Synset definition in WordNet – “buy” 1. buy, purchase (obtain by purchase;) 2. bribe, corrupt, buy, make grease palm (make illeagal payment)

  14. 知网中概念的定义 (2) Concept definitions in HowNet – “buyer” {human|人:domain={commerce|商业},{buy|买:agent={~}}} Cf. Synset definition in WordNet – “buyer” buyer, purchaser, emptor, vendee (a person who buys) 哪个 “buy”? -- 在 WordNet中是歧义的; 但在 HowNet中是没有歧义的

  15. 知网的知识力量 – 动态的激活 • 知网常识推理举例 • 概念相似度计算 • 概念相关关系的建立

  16. 知网常识推理举例 • Can a doctor walk? • 下列句子的省略如何推导的? “我在南京买了几本很好的词典,到家发现全都丢了。” -- 谁丢?丢什么?

  17. Can a doctor walk? (1) 1. “doctor”的定义 DEF={human|人:HostOf={Occupation|职位}, domain={medical|医},{doctor|医治:agent={~}}} 2. “entity”义原分类体系表 │ │ │ ├ {AnimalHuman|动物} {animate|生物:HostOf={Sex|性别},{AlterLocation|变空间位置:agent={~}},{StateMental|精神状态:experiencer={~}}} │ │ │ │ ├ {human|人} {AnimalHuman|动物:HostOf={Name|姓名}{Wisdom|智慧}{Ability|能力},{think|思考:agent={~}},{speak|说:agent={~}}}

  18. Can a doctor walk? (2) 3. “event”义原分类体系表 │ ├ {AlterLocation|变空间位置} │ │ ├ {SelfMove|自移} │ │ │ ├ {SelfMoveInManner|方式性自移} │ │ │ │ ├ {roam|流浪} │ │ │ │ ├ {walk|走}

  19. 公理关系与角色转换 - 1 我在南京买了几本很好的词典,到家发现全都丢了。 {buy|买} <----> {obtain|得到} [consequence]; agent OF {buy|买}=possessor OF {obtain|得到}; possession OF {buy|买}=possession OF {obtain|得到}. {obtain|得到} <----> {own|有} [hypernym]; possessor OF {obtain|得到}=possessor OF {own|有}; possession OF {obtain|得到}=possession OF {own|有}.

  20. 公理关系与角色转换 - 2 {lose|失去} <----> {own|有} [precondition]; possessor OF {lose|失去}=possessor OF {own|有}; possession OF {lose|失去}=possession OF {own|有}. {lose|失去} <----> {obtain|得到} [mutual precondition]; possessor OF {lose|失去}=possessor OF {obtain|得到}; possession OF {lose|失去}=possession OF {obtain|得到}.

  21. 概念相似度计算 贪官 <> 学生 0.307692 贪官 <> 教师 0.355556 贪官 <> 校长 0.386667 贪官 <> 市长 0.454545 walk <> run 0.144444 walk <> jump 0.144444 walk <> swim 0.130159 walk <> fly 0.124444 walk <> buy 0.018605

  22. 概念相关关系的建立 • 试比较HowNet • 关于WordNet的评述

  23. 试比较HowNet 举例: buy 床

  24. 关于WordNet的评述(1) • On WordNet [1] Jordan Bo Boyd-Graber et al., Oct. 2005, Adding Dense, Weighted Connections to WordNet (Princeton paper) [2] Rila Mandala et al., ACL W98, 1998, The Use of WordNet in Information Retrieval(TIT paper)

  25. 关于WordNet的评述(2) Princeton paper reads: “1.1 Shortcomings of WordNet • No cross-part-of-speech links [traffic (n) – stop (v)] • Too few relations [chopsticks – Chinese restaurant] • No weighted arcs [run:jog; run:move]”

  26. 关于WordNet的评述(3) Princeton paper continues: To address these shortcomings, we are working to enhance WordNet by adding a radically different kind of information. The idea is to add quantified, oriented arcs between pairs of synsets, e.g. from {car, auto} to {road, route}, from {buy, purchase} to {shop, store}, and also in the opposite direction. Each of these arcs will bear a number corresponding to the strength of the relationship. We chose to use the concept of evocation – how much one concept evokes or brings to mind the other – to model the relationships between synsets.

  27. 结语 • 知识是关系的系统; • 知网是描述概念与概念间的关系以及概念的属性与属性间的关系的知识系统; • 知网描述的关系是可计算的; • 知网在本质上不同于WordNet; • 知网在发展。

  28. 谢谢!欢迎来到www.keenage.com

  29. 附录 -- 普遍的语义机制 跳: 跳河 --jump into a river (LocationFin) 跳楼 -- jump off a high building (LocationIni) 跳墙 -- jump over a wall (LocationThru) 导: 导游 -- 导购-- 导诊 托: 医托 -- 婚托 野: 野餐 -- 野炊 -- 野营 -- 野游 -- 野泳 / 野浴

  30. 附录 -- 基本数据统计 中文 : 06-04-07 synset: Set = 13700 (13692) (13463 ) Word Form = 55180 (55150) (54312) antonym: Set = 13154 (13145) (12777) converse: Set = 6803 (6804) (6753) 英文 : synset: Set = 18622 (18610) (18575) Word Form = 58622 (58588) (58488) antonym: Set = 12269 (12268) (12032) converse: Set = 6455 (6454) (6442)

  31. 附录 -- 1. 事件框架 ~ Verb frame - {event|事件} ├ {static|静态} {event|事件} │ ├ {relation|关系} {static|静态} │ │ ├ {possession|领属关系} {relation|关系} │ │ │ ├ {own|有} {possession|领属关系:possessor={*},possession={*}} │ │ │ │ ├ {obtain|得到} {own|有:possessor={*},possession={*},source={*}} └ {act|行动} {event|事件:agent={*}} ├ {ActGeneral|泛动} {act|行动:agent={*}} └ {ActSpecific|实动} {act|行动:agent={*}} └ {AlterSpecific|实变} {ActSpecific|实动:agent={*}} ├ {AlterRelation|变关系} {AlterSpecific|实变:agent={*}} │ ├ {AlterPossession|变领属} {AlterRelation|变关系:agent={*},possession={*}} │ │ ├ {take|取}{AlterPossession|变领属:agent={*},possession={*},source={*}} │ │ │ ├ {buy|买} {take|取:agent={*}, possession={*}, source={*}, cost={*}, beneficiary={*}

  32. 附录 -- 2. 事件角色的典型演员 ~ VerbNet │ ├ {buy|买} {take|取:agent={human|人}{group|群体->}, possession={artifact|人工物->}, source={human|人}{InstitutePlace|场所}, cost={money|货币}, beneficiary={human|人}{group|群体->}, domain={economy|经济}}

  33. 附录 -- 关系类型

More Related