1 / 30

Exploring Language and Computing: An Introduction to Computational Linguistics

This lecture introduces the intersection of language and computers, covering topics like Natural Language Processing (NLP), generative grammar, and formal languages. It discusses key disciplines including phonetics, morphology, syntax, semantics, and pragmatics. Theoretical concepts like formal grammar, ambiguity management, and information theory are also explored. The lecture emphasizes understanding the structure of language and its impact on meaning, as well as the application of computational models to linguistic analysis. The work of linguist Noam Chomsky and the development of generative grammar are highlighted to provide a foundational understanding of computational linguistics.

salberta
Download Presentation

Exploring Language and Computing: An Introduction to Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ComputationalLinguistics INTroduction Lecture 1 Computers and Language

  2. Course Information • Course Websitehttp://staff.um.edu.mt/mros1/lin2160 • Lecturersmike.rosner@um.edu.mtray.fabri@um.edu.mt • BookJurafsky & Martin, Speech and Language Processing, Prentice Hall 2009, ISBN 978-0-13-504196-3 Natural Language Toolkit (NLTK)http://www.nltk.org/ CLINT - Lecture 1

  3. CL: Two Main Disciplines LINGUISTICS COMP SCI language and computers CLINT - Lecture 1

  4. Language and Computers includes … • Natural Language Processing (NLP) • Computational models of language analysis, interpretation, and generation. • syntax/semantics interface • Human Language Technology • emphasis on large-scale performance • example1: Google search • example2: speech technology • Computational Linguistics • Emphasis on mechanised linguistic theories. • Grew out of early Machine Translation efforts CLINT - Lecture 1

  5. Linguistics • Phonetics: The study of speech sounds • Phonology: The study of sound systems • Morphology: The study of word structure • Syntax: The study of sentence structure • Semantics: The study of meaning • Pragmatics: The study of language use CLINT - Lecture 1

  6. Noam Chomsky • Noam Chomsky’s work in the 1950s radically changed linguistics, making syntax central. • Chomsky has been the dominant figure in linguistics ever since. • Chomsky invented the generative approach to grammar. CLINT - Lecture 1

  7. Generative Grammar:Some Key Points • Theory of grammar includes mathematical definition of what a grammar is. • A language is a (possibly infinite) set of sentences. • But a grammar is finite. • Grammar generatesall and only sentences of a language. • Undergeneration • Overgeneration [source: Sag & Wasow] CLINT - Lecture 1

  8. Generative Power of a Grammar L G G L overgeneration all but not only undergeneration only but not all L G all and only CLINT - Lecture 1

  9. Formal Grammar • Grammar is a set of rewrite rules • Rules have the formLHS  RHS • LHS can be rewritten as RHS • LHS & RHS are sequences made of words or symbols • Lexicon specifies words and their categories Category  word • Category can be rewritten as word CLINT - Lecture 1

  10. NP VP N V NP N John kicks Bill A Simple Grammar/Lexicon grammar: S  NP VP NP  N VP  V NP lexicon: V  kicks N  John N  Bill S CLINT - Lecture 1

  11. Formal Languages Arithmetic3290 1 1010101 Logicx man(x)  mortal(x) URLhttp://www.cs.um.edu.mt Natural Languages EnglishJohn saw the dog GermanJohann hat den hund gesehen MalteseĠianni ra kelb Formal v. Natural Languages CLINT - Lecture 1

  12. Some Points of Similarity • Sentences are sequences of words (or symbols). • Rules determine which sequences are valid sentences. • Sentences have a definite structure. • Sentence structure systematically related to meaning. CLINT - Lecture 1

  13. Structure Affects Meaning I shot an elephant in my trousers CLINT - Lecture 1

  14. Formal Languages The grammar defines the language Restricted application Non ambiguous Natural Languages The language defines the grammar Universal application Highly ambiguous Points of Difference CLINT - Lecture 1

  15. Ambiguity • Morphological Ambiguityen-large-ment • Lexical AmbiguityIraqi Head Seeks Arms • Syntactic Ambiguitysmall animals and children laugh • Semantic Ambiguityevery girl loves a sailor • Pragmatic Ambiguitycan you pass the salt? • The management of ambiguity is central to the success of CL CLINT - Lecture 1

  16. I made her duck • I cooked a duck for her • I cooked a duck belonging to her • I created a duck for her • I created a duck that now belongs to her • I caused her to lower her head • I turned her into a duck CLINT - Lecture 1

  17. Computer Science • The study of basic concepts • Information • Data • Algorithm • Program • The application of these concepts to practical tasks. • Implementation of computational models from other fields (meteorology,..,linguistics) CLINT - Lecture 1

  18. Information Data Algorithm Program • Information is a theoretical concept invented by Shannon in 1948 to measure uncertainty. The units of this measure are called bits. • Length – metres • Weight – kilos • Information – bits • 1 bit is the amount of uncertainty inherent to a situation when there are exactly two possible outcomes. Example: for breakfast I will have coffee or I will have tea (nothing else). • When I tell you that I have tea, I have conveyed one bit of information. • The greater the number of possible outcomes, the more bits of infomation involved in the statement that indicates the actual outcome. CLINT - Lecture 1

  19. Information DataAlgorithm Program • A formalized representation of facts or concepts suitable for communication, interpretation, or processing by people or automated means. • Example: a telephone directory • Unlike information, which is abstract, data is concrete • Data has a certain level of structure. In the telephone directory, for example, we have the structure of a list of entries, each of which has a name, an address, and a number. CLINT - Lecture 1

  20. InformationData AlgorithmProgram A completely defined procedure for the solution of a given problem in a finite number of steps • Designed for a well-defined task. • Finite description length. • Guaranteed to terminate. • Abstract CLINT - Lecture 1

  21. Algorithm for Chocolate Cake CLINT - Lecture 1

  22. X = 0? Program to Add X and Y Read X and Y X = 2, Y = 3 subtract 1 from X add 1 to Y Output Y no yes CLINT - Lecture 1

  23. Computer Program A set of instructions, written in a specific programming language, which a computer follows in processing data, performing an operation, or solving a logical problem. • Concrete • A program can implement an algorithm. • More than one program may implement the same algorithm. • Not all programs express good algorithms! CLINT - Lecture 1

  24. Instructions vs. Execution Steps • Read X • Read Y • X = X-1 • Y = Y+1 • If X = 0 then Print(X) else goto 3 How many instructions? How many execution steps? CLINT - Lecture 1

  25. Algorithms and Linguistics • Do linguistic theories in the abstract make sense? • Linguistic theory explain linguistic knowledge in the form of • grammar rules • theories about grammar rules • But performance, involves processing issues: CLINT - Lecture 1

  26. Computational Linguistics – Issues • How are a grammar and a lexicon represented? • How is the structure of a given sentence actually discovered? • How can we actually generate a sentence to express a particular intended meaning? • How can linguistic theory be made concrete enough to test algorithmically? • Can an artificial system learn a language with limited exposure to grammatical sentences? CLINT - Lecture 1

  27. Computers and LanguageTwin Goals • Scientific Goal:Contribute to Linguistics by adding a computational dimension. • Technological Goal: Develop machinery capable of handling human language that can support “language engineering” CLINT - Lecture 1

  28. Computers and Language Tools & Resources • Grammar Formalisms, e.g.Definite Clause Grammars • Parsing Algorithmssentence  structure • Generation Algorithmsstructure  sentence • Statistical Methods • Linguistic Corpora CLINT - Lecture 1

  29. Computers and Language: Applications • Information Retrieval/Extraction • Document Classification • Question Answering • Style and Spell Checking • Multimodal Interaction • Machine Translation CLINT - Lecture 1

  30. LECTURES CLINT - Lecture 1

More Related