270 likes | 341 Views
“AL EXANDRU I OAN CUZA” UNIVERSI TY OF IAŞI FACULT Y OF COMPUTER SCIENCE. The Semantics and Pragmatics of Natural Language Daniela G ÎFU http://profs.info.uaic.ro/~daniela.gifu/. Course1. The General Presentation. Main Concepts - I. 1. Natural Language
E N D
“ALEXANDRU IOAN CUZA” UNIVERSITY OF IAŞI FACULTY OF COMPUTER SCIENCE The Semantics and Pragmatics of Natural Language Daniela GÎFU http://profs.info.uaic.ro/~daniela.gifu/
Course1 The General Presentation
Main Concepts - I • 1. Natural Language • used by human beings for communication... • sign, system, symbols, ruleset (or grammar) • 2. Semantics • word meaning, causes of words change ... • 3. Pragmatics • how language is used by a emitent in a given context, with the intention to act in a determined mode and with certain effects on the interlocutor ...
Main Concepts - II • Natural Language: Standard Romanian with the Romanian Academy • Types: a) speech (spoken language) - produced by articulate sounds • b) signing (written language) - the representation of a spoken or gestural language
Main Concepts - II 2. Semantics Types: a) denotational; b) axiomatic; c) operational; d) action; e) categorical; f) concurrency; g) game; h) predicate transformational
Main Concepts - III • 3. Pragmatics in correlation with Ambiguity • = the study of what people meant, but didn’t explicitly say... • "You have a green light“ • - No context; - No identity of the speaker; - No speaker's intent • The meaning? • - the space that belongs to you has green ambient lighting?! • - you are driving through a green traffic signal?! • - you no longer have to wait to continue driving?! • - you are permitted to proceed in a non-driving context?! • - your body is cast in a greenish glow?! • you possess a light bulb that is tinted green?! • Interaction between context and interpretation must be automated.
Formal language I* • 1. Symbol • a character, an abstract entity that has no meaning by itself • Ex: lettters, digits and special characters • 2. Alphabet • finite set of symbols • often denoted by Σ • Ex: • B = {0, 1} says B is an alphabet of two symbols, 0 and 1 • C = {a, b, c} – C an alphabet of 3 symbols, a, b and c * More about formal language: http://www.its.caltech.edu/~matilde/FormalLanguageTheory.pdf
Formal language II • 3. String or word • a finite sequence of symbols from an alphabet • Ex: • 01110 and 111 are strings from the alphabet B above • aaabccc and b are strings from the C above • 4. Sentence • astring of words. • Ex: I saw the gentleman with the hat. • String = a b c d e b f
Formal language III Define possible relations of parts of a string to each other? A. [I] saw the gentleman [with the binocular] = [a] b c d [e b f] B. I saw [the gentleman with the binocular] = a b [c d e b f ] We can represent structures with trees… Ex: I saw the gentleman with the binocular. I saw the gentleman with the binocular.
Formal language IV • 5. Language • a set of strings of symbols from an alphabet. • 6. Natural Language or ordinary language • open-ended = built on three different knowledge components: the sound of words - phonology; the meaning of words - semantics; the grammatical rules according to which words are put together - syntax. • 7. Formal language • a set L of sequences/strings over some finite alphabet Σ • described using formal grammars (a set of rules for strings, specified to it). • many application (e.g. Prognosis wearable system)
Formal language V Context-Free Grammars (CFG) - a finite set of grammar rules https://www.tutorialspoint.com/automata_theory/context_free_grammar_introduction.htm = a quadruple (N, T, P, S) , where: N = a finite set of non-terminal symbols (character or variable). Note! Each n ∈ N = type of phrase/clause in the sentence. T = a finite set of terminals (an alphabet, defined by the grammar) disjoint of N: N ∩ T = NULL. P = a finite set of (rewrite) rules or productions of the grammar, from N to P: N → (N ∪ T)* Note! The left-hand side of the production rule P does have any right context or left context. * = Kleene star operation = unary operation on sets of strings or sets of symbols or characters→ a set N is written as N*(used for regular expressions). Ex: {"a", "b", "c"}* = {ε, "a", "b", "c", "aa", "ab", "ac", "ba", "bb", "bc", "ca", "cb", "cc", "aaa", "aab", ...} -{ε} (the language consisting only of the empty string) S = start symbol/start symbol, used to represent the whole sentence.
Natural Language Processing – a subdomain of Artificial Intelligence and Linguistics • 1. Thematic Areas • Linguistics - mathematical linguistics - computational linguistics • Formal Language • Linguistic and Language Processing • The grammatical structure of utterances: the sentence, constituents, phrase, classifications and structural rules, syntactic processing ... • Parser • Semantics& Pragmatics
Mathematical linguistics - the study of mathematical structures and methods that are of importance to linguistics → Phonetics, → Phonology, → Morphology, → Syntax, and → Semantics, → and… Sociolinguistics → Language Acquisition. • Computational linguistics - the scientific and engineering discipline concerned with understanding written and spoken language from a computational perspective. • - detecting synonymy (Grigonytė et al., 2010); • - developing WordNet (Gala et Mititelu, 2013), (Iftene and Balahur, 2007)...; • WSD(Yang, H. et al. 2010), (Lefever et Hoste, 2010), (Tufiș,2002)...; • semantic annotation(Garcia et al., 2012)...; • reconstructing a diachronic morphology (Cristea et al., 2007/2012) • diachronic text classification(Mihalcea and Năstase, 2012; Popescu and Strapparava, 2015), etc.
Linguistic and Language Processing • 1. Linguistics • Science of language. Includes: • Sounds (phonology) • Word formation (morphology) • Sentence structure (syntax) • Meaning (semantics) and understanding (pragmatics)… • 2. Levels of linguistic analysis • Higher level → Speech Recognition (SR) • Lower levels → Natural Language Processing (NLP)
Levels of Linguistic Analysis Acoustic signal Phonetics – production and perception of speech Phones Phonology – Sound patterns of language SR Letters - strings Lexicon – Dictionary of words in a language Morphemes Morphology – Word formation and structure Words Syntax – Sentence structure Phrases & sentences Semantics – Intended meaning NLP Meaning out of context Pragmatics – Understanding from external info Meaning in context
Steps of NLP • 1. Morphological and Lexical Analysis • Lexicon • Morphology – identification, analysis and description of structure of words • Words – the smallest units of syntax • Syntax – the rules / principles that govern the sentence structure of any language • Lexical analysis – dividing text into paragraphs, sentences and words • 2. Syntactic analysis • Analysis of words in a sentence, knowing the grammatical structure of the sentence • Ex: Boy the go the store – correct?
Steps of NLP • 3. Semantic Analysis • Derives an absolute (dictionary definition) meaning from the context • The structure created by the syntactic analyzer are assigned meaning. A mapping is made between the syntactic structure and objects in the task domain. • Ex: “Colourless green ideas…” – correct? • 4. Discourse Integration • The meaning of an individual sentence may depend on the sentences that precede it and may influence the meaning of the sentences that follow it. • Ex: the word “it” in the sentence, “you wanted it” depends on the prior discourse context.
Steps of NLP • 5. Pragmatic analysis • Derives knowledge from the external commonsense information • Means understanding the purposeful use of language in situations particularly those aspects pf language which require world knowledge • What was said is reinterpreted to determine what was actually meant. • Ex: “Do you know what time it is” – should be interpreted as a request.
Semantics and pragmatics (S & P) • 1. S & P • 2 stages of analysis concerned with getting at the meaning of a sentence; • 1st – S – a partial representation of the meaning based on the possible syntactic structure(s) of the sentence and the meanings of the words in that sentence; • 2nd – P – the meaning based on the contextual and the world knowledge.
Semantics and pragmatics (S & P) • 1. Ex. for differences: • “He asked for the boss”. • We can work out that: • Someone (who is male) asked for someone who is a boss. • We can’t say who these people are and why the first guy wanted the second. • If we know something about the context (including the last few sentences spoken/written) we may be able to work these things out. • Maybe the last sentence was: “Fred had just been sacked”. • From our general knowledge that bosses generally sack people: if people want to speak to people who sack them it is generally to complain about it. • 6. We could then really start to get at the meaning of the sentence: “Fred wants to complain to his boss about getting sacked”.
Homework: • Each student has to present a paper about his/her SEMEVAL task that guide final project - (https://aclweb.org/anthology/) • între 2011-2017 • EMNLP (Empirical Methods on Natural Language Processing) • ACL (Association of Computational Linguistics) • EACL (European Association of Computational Linguistics) • COLING (International Conference on Computational Linguistics)
Other references… • Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, Rabab Ward (2015) Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval. In https://arxiv.org/pdf/1502.06922.pdf • Kate Cohen, Fredrik Johansson, Lisa Kaati, and Jonas Mork, (2014) Detecting Linguistic Markers for Radical Violence in Social Media, Terrorism and Political Violence26, no. 1 : 246-256. • Joel Brynielsson, Andreas Horndahl, Fredrik Johansson, Lisa Kaati, Christian Martenson, and Pontus Svenson. (2013). Harvesting and Analysis of Weak Signals for Detecting Lone-Wolf Terrorists. Security Informatics 2, no. 11 (2013), accessed May 15, 2016, http://www.security-informatics.com/content/2/1/11; • Alexander V. Mamishev and Murray Sargent. (2013). Creating Research and Scientific Documents Using Microsoft Word. Microsoft Press, Redmond, WA. • Sean M. Gerrish and David M. Blei. (2010). A language-based approach to measuring scholarly impact. In Proceedings of International Conference of Machine Learning.
Alexander V. Mamishev and Sean D. Williams. 2010. Technical Writing for Teams: The STREAM Tools Handbook. Wiley-IEEE Press, Hoboken, NJ. • Jonas Muller, Aditya Thyagarajan (2016). Siamese Recurrent Architectures for Learning Sentence Similarity. In Proceedings of AAAI-16 • Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Phil Blunsom (2015) Reasoning about entailment with Neural Attention. IN Proceedings of ICLR, http://arxiv.org/abs/1509.06664 • Xiaofeng Wang, Matthew S. Gerber, and Donald E. Brown. 2012. Automatic Crime Prediction using Events Extracted from Twitter Posts. SBP, LNCS 7227:231-238. • Yaser Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin. (2012). Learning From Data, amlbook.com. • Jiaming Xu, Peng Wang, Guanhua Tian, Bo Xu, Jun Zhao, Fangyuan Wang, Hongwei Hao (2015) Short Text Clustering via Convolutional Neural Networks. In Proceedings of NAACL-HLT 2015, 62–69 • Trevor Hastie, Robert Tibshirani, Jerome Friedman. (2008). The Elements of Statistical Learning. Data Mining, Inference, and Prediction,2nd ed., Springer.
Final project: SEMEVAL 2018 Groups structured by 2 students: - 1 humanist & 1 informatician prepare a paper at the SEMEVAL-2018 based to their research supervised constantly - http://alt.qcri.org/semeval2018/index.php?id=tasks Read the editing guide for a scientific paper: http://libguides.usc.edu/writingguide/purpose
Teams: • Iuliana-Alexandra Fleșcan-Lovin-Arseni & Sandra-Maria Amarandei & Ramona-Andreea Turcu • Mihaela Plămadă-Onofrei & Ionuț Hulub • Raluca Preisler & Ștefan Oprea • Larisa Alexa & Alina-Beatrice Lorenț • (5) Andreea Hrițcu & Grigore Ioniță (cleaning a Verb Dictionary for English)