310 likes | 390 Views
Ling 138/238. Martin Kay Stanford University. Introduction to. Computational Linguistics. 30 Introduction Oct 1 Complexity; String search 6 Knuth-Morris-Pratt; Boyer Moore; 8 Suffix Trees 13 Tagging; Alignment 15 20 Chomsky Hierarchy; Regular Expressions 22
E N D
Ling 138/238 Martin Kay Stanford University Introduction to Computational Linguistics
30 Introduction Oct 1 Complexity; String search 6 Knuth-Morris-Pratt; Boyer Moore; 8 Suffix Trees 13 Tagging; Alignment 15 20 Chomsky Hierarchy; Regular Expressions 22 27 Finite-state automata 39
Nov 3 Morphology 5 10 Context-free grammar 12 17 Unification, HPSG, LFG 19 24 Machine Translation 26 Dec 1 Summary; Wrap-up 3
Linguistics 138/238 Martin Kay KAY@csli.stanford.edu 740 3043 Margaret Jacks 124 Office hours: TuTh 4.15-5.45 p.m.
Prerequisites and Expectations • No prerequisites • Classroom participation • Occasional readings • Learn Prolog • Laboratory sessions • Homework Problems • Project
Project • Learn something new about language • Significant programming • Group work • Modifying or amplifying existing code A HMM-based tagger A searcher for tagged text Implementation of Suffix trees Morphological analysis Named-entity recognition
Intellectual Relations Relation to • Linguistics • Psychology • Artificial Intelligence • Computer Science Abstract Process
Computational Linguistics as Science Computing as Inspiration
Ideas from Computing Search Divide and Conquer Guides and Oracles Nondeterminism Dynamic Programming Scheduling, agendas Compilation Unification Automata Theory Co-routining and parallelism Top-down vs. bottom-up Complexity
Ideas from Computing Search Nondeterminism Dynamic Programming
A Maize Search Nondeterminism Dynamic Programming Keep you right hand on the wall
Out! Backup! Backup! Backup! A Maize Search Nondeterminism Dynamic Programming
Nondeterminism Search Nondeterminism Dynamic Programming • A process is nondeterministic if there are points in it when a choice must be made, but the information necessary to make the choice is not available. • Solution: Pick one of the alternatives. If it does not work out, come back and pick another one. • Note: the information required to make the choice was available after all!
p o u r f 1 2 3 4 o 2 1 2 3 r 3 2 2 2 DynamicProgramming Search Nondeterminism Dynamic Programming Chalons Metz 192 266 161 Paris 458 Strasbourg 619 288 234 115 620 344 Mulhouse 276 Dijon
The CKY Chart Search Nondeterminism Dynamic Programming people np np np s s s like prep pp pp v vp vp the det np np French adj n n n drink n vp Context free: All phrase with the same — Coverage, and — Category enter into larger phrases as a single item
Ideas from Computing Unification
Unification Unification Attribute Report 1 Report 2 Combined Report eyes blue blue blue hair black or brown brown or red brown accent Italian Italian wife see below see below see below children Ahemed & Angela Rebecca & Angela Ahmed, Angela & Rebecca age middle 48 Middle Wife eyes brown brown weight 247 lbs 112 Kg 247 lbs disposition surly surly
Unification Unification Attribute Report 1 Report 2 Combined Report eyes blue blue blue hair black or brown brown or red brown accent Italian Italian wife see below see below see below children Ahemed & Angela Rebecca & Angela Ahmed, Angela & Rebecca age middle 48 Middle Wife eyes brown grey FAIL weight 247 lbs 112 Kg 247 lbs disposition surly surly
English Agreement Unification The dogsleeps The dogssleep The dog slept The dogs slept The sheep sleeps The sheep sleep The sheep slept The sheep that was in the barn slept The sheep that were in the barn slept
German Case Unification Der Junge sah den Lehrer Den Lehrer sah der Junge Das Mädchen sah der Junge der Junge sah das Mädchen Die Lehrerin sah den Lehrer Die Lehrerin sah das Mädchen
Ideas from Computing Finite-State Methods
Finite-State Methods in Language Processing Finite-State Methods The Application of a branch of mathematics • The regular branch of automata theory to a branch of computational linguistics in which what is crucial is (or can be reduced to) • Properties of string sets and string relations with • A notion of bounded dependency
Finite Languges Dictionaries Compression Phenomena involving bounded dependency Morpholgy Spelling Hyphenation Tokenization Morphological Analysis Phonology Approximations to phenomena involving mostly bounded dependency Syntax Phenomena that can be translated into the realm of strings with bounded dependency Syntax Applications Finite-State Methods
Ideas from Computing Complexity
The Chomsky Hierarchy Complexity Grammar Language Automaton Type 0 Recursively Turing Machines Enumerable Sets Context-sensitive Context-sensitive Nondeterministic linear space bound Turing Machines Context-free Context-free Nondeterministic push- down automata LR(k) Deterministic Context- Deterministic push-down free automata Regular Expressions Regular Sets Finite-state automata Left (Right) Linear
Computation and Psychology Sentence Processing
Computational Linguistics as Engineering Computing as Power
Tools for Linguists • TLF, OED • Corpus Linguistics • Field Notes • Grammar Testing
Translation • MT, Translator's Tools • Alignment, Dictionaries, Term Banks • Normalization and Tuning
Other Applications • Writer's Tools • Spelling • Dictionary, Thesaurus • Grammar • Natural Language Interfaces • Information Storage and Retrieval
CL & AI • Text, Meaning, and Interpretation Linguistics ??? • • • • • • • • • • Text Interpretation Meaning