200 likes | 238 Views
“AL EXANDRU I OAN CUZA” UNIVERSIT ATY OF IAŞI FACULT Y OF COMPUTER SCIENCE. The Semantics and Pragmatics of Natural Language Daniela G ÎFU http://profs.info.uaic.ro/~daniela.gifu/. Course1. The General Presentation. Main Concepts. 1. Natural Language
E N D
“ALEXANDRU IOAN CUZA” UNIVERSITATY OF IAŞI FACULTY OF COMPUTER SCIENCE The Semantics and Pragmatics of Natural Language Daniela GÎFU http://profs.info.uaic.ro/~daniela.gifu/
Course1 The General Presentation
Main Concepts • 1. Natural Language • used by human beings for communication... • sign, system, symbols, ruleset (or grammar) • 2. Semantics • word meaning, causes of words change ... • 3. Pragmatics • how language is used by a emitent in a given context, with the intention to act in a determined mode and with certain effects on the interlocutor ...
Natural Language Processing – a subdomain of Artificial Intelligence and Linguistics • 1. Thematic Areas • Linguistics - mathematical linguistics - computational linguistics • Formal Language • Linguistic and Language Processing • The grammatical structure of utterances: the sentence, constituents, phrase, classifications and structural rules, syntactic processing ... • Parser • Semantics& Pragmatics
Mathematical linguistics - the study of mathematical structures and methods that are of importance to linguistics → Phonetics, → Phonology, → Morphology, → Syntax, and → Semantics, → and… Sociolinguistics → Language Acquisition. • Computational linguistics - the scientific and engineering discipline concerned with understanding written and spoken language from a computational perspective. • - detecting synonymy (Grigonytė et al., 2010); • - developing WordNet (Gala et Mititelu, 2013), (Iftene and Balahur, 2007)...; • WSD(Yang, H. et al. 2010), (Lefever et Hoste, 2010), (Tufiș,2002)...; • semantic annotation(Garcia et al., 2012)...; • reconstructing a diachronic morphology (Cristea et al., 2007/2012) • diachronic text classification(Mihalcea and Năstase, 2012; Popescu and Strapparava, 2015), etc.
Formal language • 1. Symbol • a character, an abstract entity that has no meaning by itself • Ex: lettters, digits and special characters • 2. Alphabet • finite set of symbols • often denoted by Σ • Ex: B = {0, 1} says B is an alphabet of two symbols, 0 and 1 • C = {a, b, c} – C an alphabet of 3 symbols, a, b and c
Formal language • 3. String or a word • a finite sequence of symbols from an alphabet • Ex: 01110 and 111 are strings from the alphabet B above • aaabccc and b are strings from the C above • 4. Language • a set of strings from an alphabet • 5. Formal language (or simply language) • a set L of strings over some finite alphabet Σ • described using formal grammars
Linguistic and Language Processing • 1. Linguistics • Science of language. Includes: • Sounds (phonology) • Word formation (morphology) • Sentence structure (syntax) • Meaning (semantics) and understanding (pragmatics)… • 2. Levels of linguistic analysis • Higher level → Speech Recognition (SR) • Lower levels → Natural Language Processing (NLP)
Levels of Linguistic Analysis Acoustic signal Phonetics – production and perception of speech Phones Phonology – Sound patterns of language SR Letters - strings Lexicon – Dictionary of words in a language Morphemes Morphology – Word formation and structure Words Syntax – Sentence structure Phrases & sentences Semantics – Intended meaning NLP Meaning out of context Pragmatics – Understanding from external info Meaning in context
Steps of NLP • 1. Morphological and Lexical Analysis • Lexicon • Morphology – identification, analysis and description of structure of words • Words – the smallest units of syntax • Syntax – the rules / principles that govern the sentence structure of any language • Lexical analysis – dividing text into paragraphs, sentences and words • 2. Syntactic analysis • Analysis of words in a sentence, knowing the grammatical structure of the sentence • Ex: Boy the go the store – correct?
Steps of NLP • 3. Semantic Analysis • Derives an absolute (dictionary definition) meaning from the context • The structure created by the syntactic analyzer are assigned meaning. A mapping is made between the syntactic structure and objects in the task domain. • Ex: “Colourless green ideas…” – correct? • 4. Discourse Integration • The meaning of an individual sentence may depend on the sentences that precede it and may influence the meaning of the sentences that follow it. • Ex: the word “it” in the sentence, “you wanted it” depends on the prior discourse context.
Steps of NLP • 5. Pragmatic analysis • Derives knowledge from the external commonsense information • Means understanding the purposeful use of language in situations particularly those aspects pf language which require world knowledge • What was said is reinterpreted to determine what was actually meant. • Ex: “Do you know what time it is” – should be interpreted as a request.
Semantics and pragmatics (S & P) • 1. S & P • 2 stages of analysis concerned with getting at the meaning of a sentence; • 1st – S – a partial representation of the meaning based on the possible syntactic structure(s) of the sentence and the meanings of the words in that sentence; • 2nd – P – the meaning based on the contextual and the world knowledge.
Semantics and pragmatics (S & P) • 1. Ex. for differences: • “He asked for the boss”. • We can work out that: • Someone (who is male) asked for someone who is a boss. • We can’t say who these people are and why the first guy wanted the second. • If we know something about the context (including the last few sentences spoken/written) we may be able to work these things out. • Maybe the last sentence was: “Fred had just been sacked”. • From our general knowledge that bosses generally sack people: if people want to speak to people who sack them it is generally to complain about it. • 6. We could then really start to get at the meaning of the sentence: “Fred wants to complain to his boss about getting sacked”.
Homework: • Each student has to present a paper about clustering texts that guide final project - (https://aclweb.org/anthology/) • între 2010-2016 • Platformele: • LREC (Language Resources and Evaluation Conference) • ACL (Association of Computational Linguistics) • EACL (European Association of Computational Linguistics) • Coling (International Conference on Computational Linguistics)
Other references… • Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, Rabab Ward (2015) Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval. In https://arxiv.org/pdf/1502.06922.pdf • Kate Cohen, Fredrik Johansson, Lisa Kaati, and Jonas Mork, (2014) Detecting Linguistic Markers for Radical Violence in Social Media, Terrorism and Political Violence26, no. 1 : 246-256. • Joel Brynielsson, Andreas Horndahl, Fredrik Johansson, Lisa Kaati, Christian Martenson, and Pontus Svenson. (2013). Harvesting and Analysis of Weak Signals for Detecting Lone-Wolf Terrorists. Security Informatics 2, no. 11 (2013), accessed May 15, 2016, http://www.security-informatics.com/content/2/1/11; • Alexander V. Mamishev and Murray Sargent. (2013). Creating Research and Scientific Documents Using Microsoft Word. Microsoft Press, Redmond, WA. • Sean M. Gerrish and David M. Blei. (2010). A language-based approach to measuring scholarly impact. In Proceedings of International Conference of Machine Learning.
Alexander V. Mamishev and Sean D. Williams. 2010. Technical Writing for Teams: The STREAM Tools Handbook. Wiley-IEEE Press, Hoboken, NJ. • Jonas Muller, Aditya Thyagarajan (2016). Siamese Recurrent Architectures for Learning Sentence Similarity. In Proceedings of AAAI-16 • Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Phil Blunsom (2015) Reasoning about entailment with Neural Attention. IN Proceedings of ICLR, http://arxiv.org/abs/1509.06664 • Xiaofeng Wang, Matthew S. Gerber, and Donald E. Brown. 2012. Automatic Crime Prediction using Events Extracted from Twitter Posts. SBP, LNCS 7227:231-238. • Yaser Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin. (2012). Learning From Data, amlbook.com. • Jiaming Xu, Peng Wang, Guanhua Tian, Bo Xu, Jun Zhao, Fangyuan Wang, Hongwei Hao (2015) Short Text Clustering via Convolutional Neural Networks. In Proceedings of NAACL-HLT 2015, 62–69 • Trevor Hastie, Robert Tibshirani, Jerome Friedman. (2008). The Elements of Statistical Learning. Data Mining, Inference, and Prediction,2nd ed., Springer.
Final project: Implementing a tool for text clustering, including diachronic perspective (demo & Web resource) - SEMANTRIA model • Mixed teams (linguists + informaticians) • - Building corpus: • http://www.bbc.com/news – English • http://www.e-ziare.ro/ – Romanian • - NER • - Topic - Detection – LDA • Domain & subdomains detection • Sentiment Analysis • Automatic News Source Detection • - Interface Construction