70 likes | 172 Views
Midterm Review. CS4705 Natural Language Processing. Midterm Review. Statistical v. Symbolic Processing 80/20 Rule Regular Expressions Finite State Automata Determinism v. non-determinism (Weighted) Finite State Transducers Morphology Word Classes and p.o.s.
E N D
Midterm Review CS4705 Natural Language Processing
Midterm Review • Statistical v. Symbolic Processing • 80/20 Rule • Regular Expressions • Finite State Automata • Determinism v. non-determinism • (Weighted) Finite State Transducers • Morphology • Word Classes and p.o.s. • Inflectional v. Derivational • Affixation, infixation, concatenation • Morphotactics
Different languages, different morphologies • Evidence from human performance • Morphological parsing • Koskenniemi’s two-level morphology • FSAs vs. FSTs • Porter stemmer • Noise channel model • Bayesian inference • Spelling correction • Bayesian approach
Minimum Edit Distance (Levenshtein distance) • Dynamic Programming • N-grams • Markov assumption • Chain Rule • Language Modeling • Simple, Adaptive, Class-based (syntax-based) • Smoothing • Add-one, Witten-Bell, Good-Turing • Back-off models
Creating and using ngram LMs • Corpora • Maximum Likelihood Estimation • Syntax • Chomsky’s view: Syntax is cognitive reality • Parse Trees • Dependency Structure • Part-of-Speech Tagging • Hand Written Rules v. Statistical v. Hybrid • Brill Tagging
Types of Ambiguity • Context Free Grammars • Top-down v. Bottom-up Derivations • Left Corners • Grammar Equivalence • Normal Forms (CNF) • Probabilistic Parsing • CYK parser • Derivational Probability • Lexicalization
Machine Learning • Dependent v. Independent variables • Training v. Development Test v. Test sets • Feature Vectors • Metrics • Accuracy • Precision, Recall, F-Measure • Gold Standards