1 / 40

From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data

From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data. Chu-Ren Huang Academia Sinica http://cwn.ling.sinica.edu.tw/huang/huang.htm. Outline. A generative lexicalist approach to grammar

kathie
Download Presentation

From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Lexical Semantics to Knowledge Systems:How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica http://cwn.ling.sinica.edu.tw/huang/huang.htm

  2. Outline • A generative lexicalist approach to grammar • From distributional data to the basic contrasts in a semantic field (or conceptual motivation for corpus distribution) • Lexical distribution as cognitive model • Radical as ontology • Language as a knowledge system ISLCC Chu-Ren Huang

  3. Introduction: A generative lexicalist approach to grammar Back to Aristotle (through Pustejovsky) • How do know and know and what do we know: through what we experience • Qualia Structure: what we experience • Formal • Constitutive • Agentive • Telic ISLCC Chu-Ren Huang

  4. Linguistics: What do we know about language • Qualia Structure of Theory of Language • Formal: from Sign to Structure, Structuralism • Constitutive: from IA to IP, rule and transformation based theories • Agentive: UG approaches • Telic: Function and Use based Theories • We need a linguistic theory that accounts for the complete knowledge structure, not just its individual aspects ISLCC Chu-Ren Huang

  5. Towards Language as Knowledge System • Atoms of knowledge:lexicalized concepts • ‘frames’ of knowledge:lexical semantic relations • Instantiation of knowledge:corpus lexicon-driven, corpus-based to infer knowledge structure underlying linguistic structure ISLCC Chu-Ren Huang

  6. Three Studies • The semantic field of emotion:(elaborated from Chang et al. 2000) • Lexicalized Model of Cognition: (Huang and Hong 2005) • Conventionalized Ontology in Writing(Chou and Huang 2005) ISLCC Chu-Ren Huang

  7. Semantic Field of Verbs of Emotion • Issues: Methodological • Interpretation of Distributional Data • Measuring and Interpreting lexical choices • Issues: Linguistic • Archetype Via Contrast • Why Change-of-State: • Saliency and relevance to human cognition ISLCC Chu-Ren Huang

  8. Distributional Contrast of Verbs of Emotion 高興gao1xing4 (Type A) Vs.快樂kuai4le4(Type B) • Category: intrans. vs. trans. state verb • Function: more predicative vs. more nominalized • Collocation: CAUSE complement vs. no CAUSE • Collocation: Perfect aspect vs. no -le • Collocation (modified nouns): Eventive vs. no selection • Interpretation (Imperative): Command vs. Wish ISLCC Chu-Ren Huang

  9. A Natural Dichotomy of Verbs of Emotion Subtype Type A Type B Happiness gao1xing4高興(669) kuai4le4快樂(942) kai1xin1開心(152) yu2kuai4愉快(271) tong4kuai4痛快(40) xi3yue4喜悅(156) huan1le4歡樂(141) huan1xi3歡喜(107) kuai4huo2快活(48) Depression nan2guo4難過(232) Tong4ku3痛苦(443) tong4xin1痛心(48) chen2zhong4沈重(83) ju3sang4沮喪(62) ISLCC Chu-Ren Huang

  10. A Natural Dichotomy of Verbs of Emotion Subtype Type A Type B Sadness hang1xin1傷心(134)bei1shang1悲傷(52) Regret hou4hui3後悔(102)yi2han4遺憾(198) Anger seng1qi4生氣(307) fen4nu4憤怒(112) qi4fen4氣憤(49) Fear hai4pa4害怕(261)kong3ju4恐懼(149) wei4ju4畏懼(40) Worry dan1xin1擔心(609) fan2nao3煩惱(199) dan1you1擔憂(64) ku3nao3苦惱(45) you1xin1憂心(46) ISLCC Chu-Ren Huang

  11. Some Observations • Each of the seven kinds of emotion verbs show the same dichotomy: • change-of-state vs. homogeneous state • Each side of the dichotomy is dominated by a dominating verb • in terms of frequency and prototypicality of meaning ISLCC Chu-Ren Huang

  12. Semantic Field and Contrast Set • A semantic field is consisted of a unique covering term and a number of contrast sets. Paraphrase of Grandy 1992 • The unique covering term may or may not occur in a contrast set. • All other members of the semantic field must be determined by entering into a contrast set relation with a known member of the semantic field. ISLCC Chu-Ren Huang

  13. Observation: Chinese Defines a Property by Contrast • qing1zhong4 light+heavy = weight • da4xiao3 big+small = size • gao1ai3 tall+short = height • shi4fei1/dui4cuo4 right+wrong = affair • xiong1di4 elder+younger = brothers • zang1pi3 praise+attack = criticize • hu1xi1 exhale+inhale = breathe ISLCC Chu-Ren Huang

  14. Our Proposal • T is either a single term or a privileged contrast set, called a contrast pair. • When T is a contrast pair, the semantic field can be defined by the shared semantic properties of the pair. • The fundamental contrast relation defining a contrast pair may be shared by a super-set of semantic fields. ISLCC Chu-Ren Huang

  15. Our Proposal • T must enter contrast set relations with other members of the semantic field, although the contrast relation may be weakened to a marked/unmarked contrast. • The set of fundamental contrast relations are shared by all semantic fields. [cf. Semantic relations] ISLCC Chu-Ren Huang

  16. Patterns of Distribution as Representational Clues • Numbers Don’t Lie • The pattern itself is a proof that generalizations based on a single lexical item is replicable. • The uniformity and universality of the pattern across a broad but contiguous semantic field strongly favors a conceptual motivation. ISLCC Chu-Ren Huang

  17. Functional Distribution of Type A Verbs of Emotion Type A Pred. Nom. N.M. gao1xing485.05%0.30% 1.35% nan2guo486.64%2.16% 2.59% shang1xin1 76.12%2.99% 11.19% hou4hui394.12%0.00% 2.94% sheng1qi487.82%0.00% 4.06% hai4pa493.10%3.07% 2.68% dan1xin196.72% 1.97% 1.31% Average 88.51% 1.50% 3.73% ISLCC Chu-Ren Huang

  18. Functional Distribution of Type B Verbs of Emotion Type B Pred. Nom. N.M. kuai4le437.79% 26.43% 24.84% tong4ku325.73% 45.60% 20.54% bei1shang1 40.38% 28.85% 19.23% yi2han434.85% 33.84% 3.54% fen4nu428.57% 37.50% 17.86% kong3ju423.49% 68.46% 7.38% fan2nao324.12% 69.85% 6.03% Average 30.70% 44.36% 14.21% ISLCC Chu-Ren Huang

  19. Preference of A verbs over B verbs in Predicative Uses Verbs Pred.-Freq. A/B Ratio gaoxing/kuaile 569/356 1.59 nanguo/tongku 201/114 1.76 shangxin/beishang 102/21 4.86 houhui/yihan 96/69 1.39 shengqi/fennu 238/32 7.44 haipa/kongju 243/35 6.94 danxin/fannao 589/48 12.27 Average ratio 5.62 ISLCC Chu-Ren Huang

  20. Preference of B verbs over A verbs in Nominal Uses Verbs Nom.-Freq. B/A Ratio gaoxing/kuaile 11/483 43.91 nanguo/tongku 11/293 26.64 shangxin/beishang 19/25 1.32 houhui/yihan 3/74 24.67 shengqi/fennu 11/62 5.64 haipa/kongju 15/113 7.53 danxin/fannao 20/151 7.55 Average ratio 16.75 ISLCC Chu-Ren Huang

  21. Summary of the Likelyhood Ratio Data • A clear lexical preference between near-synonyms are established. • Predicative preference and deverbal preference tend to compensate each other to establish contrast. • Overall, the deverbal preference seems to be the defining feature of the dichotomy. [note that these are all verbs.] ISLCC Chu-Ren Huang

  22. Deverbal Use Frequency ofType A Verbs tong4kuai4痛快 0.00% gao1xing4高興 1.65% hou4hui3後悔 2.94% dan1xin1擔心 3.28% sheng1qi4生氣 3.58% tong4xin1痛心 4.17% nan2guo4難過 4.75% hai4pa4害怕 5.75% you1xin1憂心 6.52% kai1xin1開心 7.89% dan1you1擔憂 9.38% shang1xin1傷心 14.18% ISLCC Chu-Ren Huang

  23. Deverbal Use Frequency ofType B Verbs qi4fen4氣憤 24.49%chen1zhong4沈重48.19% wei4ju4畏懼 25.00% kuai4le4快樂 51.27% yu2kuai4愉快 29.89% fen4nu4憤怒 55.36% huan1xi1歡喜 30.84% tong4ku3痛苦 66.14% kuai4huo2快活33.33% kong3ju4恐懼 75.84% ju3sang4沮喪 33.87% fan2nao3煩惱 75.88% yi2han4遺憾 37.38% xi1yue4喜悅 92.20% ku3nao3苦惱 46.67% huan1le1歡樂 92.91% bei1shang1悲傷48.08% ISLCC Chu-Ren Huang

  24. Deverbal Use Frequency as a Benchmark for Type A/B Verbs • More than 10% differentiates the lowest Type B verb(qi4fen4氣憤 24.49%) from the highest Type A verbs (shang1xin1傷心14.18%). • The smallest gap between a competing pair is almost 34% (shang1xin1傷心14.18%vs. bei1shang1悲傷48.08% ). ISLCC Chu-Ren Huang

  25. The Noisy-Channel Model of Theory of Communication • Our Proposal • Language is an information-based communication system. • An optimized communication system is where all redundant signs (for one piece of information) also minimally differentiate another piece of information. ISLCC Chu-Ren Huang

  26. Re-Interpretation of the Data • Members of the same semantic field in general, and a near-synonym pair in particular, are competing signs to express information pertaining to the field. • A sign is chosen to represent a piece of information because it expresses that piece of information most effectively. ISLCC Chu-Ren Huang

  27. Re-Interpretation of the Data • This preference for expressing certain information can be lexicalized to establish logical implicature. • Once that lexical preference is established, linguists could use the preferential ratio to infer the lexical information being carried. ISLCC Chu-Ren Huang

  28. Lexical distribution as cognitive model: Senses • A further step based on property defined by contrast, with focus on how senses are represented • Study the sense of hearing and the basic property term of sheng-yin ‘sound/voice’ • We (Huang and Hong 2005) look at the distribution of these two lexical elements in all derived words ISLCC Chu-Ren Huang

  29. 聲樂 vs.音樂 vocal music vs. music 發聲 vs.發音 make a sound vs. articulate 高聲 vs.高音 loudly vs. high pitch *噪聲 vs.噪音 noise 大聲 vs. *大音 loudly 聲 Sheng vs.音 Yin ISLCC Chu-Ren Huang

  30. 聲 Sheng +source 歌 掌 人 腳步 風 鐘 水 … 音 Yin + quality 嗓 鄉 喉 裝飾 尾 哨 … NN Compound N+* ISLCC Chu-Ren Huang

  31. Production of sounds Often refers to the manner or source of haw a sound was made 音 Perception of a sound Often refers to the sound quality or how a sound is perceived by an intelligent agent The semantic Contrast ISLCC Chu-Ren Huang

  32. A Lexicalized Schema for Hearing in Chinese From Huang and Hong 2005 Process of Hearing 聲sheng音yin 起點、來源 source 終點、結果 goal 主動完成 production 被動接收 reception   發動者(instigator) 經驗者(experiencer) ISLCC Chu-Ren Huang

  33. A Lexicalized Schema for Sense in Chinese Process of Sensation word1word0 經驗者(experiencer) Goal/perceptiopn: experience of sense  感知接收(sensation) ISLCC Chu-Ren Huang

  34. 詞彙 認知特徵的對比 感覺發動者 (instigator of action) — marked 感覺經驗者 (experiencer of sensation) — shared and unmarked 聽覺 聲 (production) 音 (perception) 視覺 看 (inchoative) 見 (bounded result) 觸覺 觸 (activity) 摸 (incremental theme) 詞彙詞義分析(7) 「視覺」、「觸覺」與「聽覺」三者的關係圖示 特徵 ISLCC Chu-Ren Huang perception

  35. Radical as ontology • Chinese writing system has been conventionalized and shared for over three thousand years • And adopted by typologically very different languages • If the radical system is a system of conceptualization, then it is the most robust and most widely used ontology ISLCC Chu-Ren Huang

  36. Example: the horse radical (from Chou 2005) • 馬 is a semantic symbol of horse • Examples: • 驩:馬名 a kind of horse • 驫:眾馬 horses • 騎:騎馬 riding a horse • 驍:良馬 a good horse • 驚:馬驚 a scared horse 馬 ISLCC Chu-Ren Huang

  37. Research Tool and Issue • Formal Description • IEEE SUMO ( Suggested Upper Merged Ontology) http://www.ontologyportal.org http://BOW.sinica.edu.tw • Issue: Why Chinese radicals are usually considered as a imperfect and misleading taxonomy? ISLCC Chu-Ren Huang

  38. Plants Descriptive/formal telic IS-A Constitutive 蕉蘭芒蒙菌蔓苦菊茱范荷茅蕈蔚菲草 蕃藥蔬菜薪苑藩藉茭 Knowledge System of the Radical 艸/艹 (Grass, for Plants) Description Usage 茲蒼芳落茸茂荒薄芬蒸莊 Parts 萌莖芽茄苗蓮葉 ISLCC Chu-Ren Huang

  39. Conclusion I:Corpus as Evidence • Core issue of a scientific explanation of language and cognition • Language as an living organism allows variations and adaptations (the evolutionary view) • The coherence of language is the shared tendency of all users • Distributional data in corpus lead to discovery of these shared tendencies • This should be more valuable than incidental example ISLCC Chu-Ren Huang

  40. Conclusion II: Language as a Knowledge System • The generative lexicalist approach to grammar: language as a knowledge system • All aspects of Language are projected from a unified knowledge system • Lexical semantics based on distributional data offers the best window to the underlying knowledge system of language ISLCC Chu-Ren Huang

More Related