50 likes | 118 Views
Midterm Review. CS4705 Natural Language Processing. Midterm Review. Statistical v. Symbolic Processing 80/20 Rule Regular Expressions Finite State Automata Determinism v. non-determinism (Weighted) Finite State Transducers Morphology Word Classes Inflectional v. Derivational
E N D
Midterm Review CS4705 Natural Language Processing
Midterm Review • Statistical v. Symbolic Processing • 80/20 Rule • Regular Expressions • Finite State Automata • Determinism v. non-determinism • (Weighted) Finite State Transducers • Morphology • Word Classes • Inflectional v. Derivational • Affixation, infixation, concatenation • Morphotactics
Morphological parsing • Koskenniemi’s two-level morphology • Porter stemmer • Minimum Edit Distance (Levenshtein) • N-grams • Markov assumption • Chain Rule • Language Modeling • Simple, Adaptive, Class-based (syntax-based), bursty • Smoothing • Add-one, Witten-Bell, Good-Turing • Back-off • Perplexity, Entropy • Maximum Likelihood Estimation
Syntax • Chomsky’s view: Syntax is cognitive reality • Parse Trees • Dependency Structure • Part-of-Speech Tagging • Hand Written Rules v. Statistical v. Hybrid • Brill Tagging • Types of Ambiguity • Context Free Grammars • Top-down v. Bottom-up Derivations • Left Corners • Grammar Equivalence • Normal Forms (CNF)
Probabilistic Parsing • (p)CYK, Earley Parsing • Derivational Probability • Lexicalization • Classification • Supertagging • Machine Learning • Dependent v. Independent variables • Training v. Development Test v. Test sets • Feature Vectors • Metrics • Accuracy • Precision, Recall, F-Measure • Gold Standards