80 likes | 96 Views
This course teaches statistical techniques for language technologies, emphasizing probability, statistics, and information theory. The interactive classes focus on intuition rather than rigor, with a hands-on approach to programming. Background reading provides rigor and detail. Prerequisites include programming skills, comfort with probabilities, and familiarity with Bayes' equation. Collaborative assignments are submitted via Blackboard. The syllabus covers N-grams, clustering, HMMs, POS tagging, decision trees, and more. Language models, ME modeling, and dimensionality reduction techniques are also explored.
E N D
11-761 Language and Statistics Spring 2016 Roni Rosenfeld http://www.cs.cmu.edu/~roni/11761-s16/
Course Goals and Style • Teaching statistical foundation and techniques for language technologies • Plugging gaping holes in LTI/CS grad student education in probability, statistics and information theory.
Course philosophy • Socratic Method • Will try to maintain in spite of large class size • participation strongly encouraged (pls state your name) • Highly interactive • Highly adaptable • based on how fast we move • Lots of Probability, Statistics, Information theory • not in the abstract, but rather as the need arises • Lectures emphasize intuition, not rigor or detail • background reading will have rigor & detail
Course Prerequisites & Mechanics • You need to be able to program, from scratch. • Largest program is O(100) lines • You need to be comfortable with probabilities • Can you derive Bayes equation in your sleep? • 11661 (masters level): no final project • Hand in assignments via Blackboard • Vigorous enforcement of collaboration & disclosure policy
Background Material No single book exists which covers the course material. • “Foundations of Statistical NLP”, Manning & Schutze • Computational Linguistics perspective • “Statistical Methods in Speech Recognition”, Jelinek • “Text Compression”, Bell, Cleary & Witten • first 4 chapters; rest is mostly text compression • “Probability and Statistics”, DeGroot • “All of Statistics” & “All of nonparametric Statistics”, Wasserman • Lots of individual articles
High Level Syllabus (subject to change) • Language Technology formalisms • source-channel formulation • Bayes classifier • Words, Words, Words • type vs, token, Zipf, Mandlebrot, heterogeneity of langauge • Modeling Word distributions - the unigram: • [estimators, ML, zero frequency, smoothing, shrinkage, G-T] • N-grams: • Deleted Interpolation Model, backoff, toolkit • Measuring Success: perplexity • [entropy, KL-div, MI], the entropy of English, alternatives
Syllabus (continued) • Clustering: • class-based N-grams, hierarchical clustering • hard and soft clustering • Latent Variable Models, EM • Hidden Markov Models, revisiting interpolated and class n-grams • Part-Of-Speech tagging, Word Sense Disambiguation • Decision & Regression Trees • Particularly as applied to language • Stochastic Grammars • (SCFG, inside-outside alg., Link grammar)
Syllabus (continued) • Maximum Entropy Modeling • exponential models, ME principle, feature induction... • Language Model Adaptation • caches, backoff • Dimensionality reduction • latent semantic analysis, word2vec • Syntactic Language Models