1 / 8

11-761 Language and Statistics

This course teaches statistical techniques for language technologies, emphasizing probability, statistics, and information theory. The interactive classes focus on intuition rather than rigor, with a hands-on approach to programming. Background reading provides rigor and detail. Prerequisites include programming skills, comfort with probabilities, and familiarity with Bayes' equation. Collaborative assignments are submitted via Blackboard. The syllabus covers N-grams, clustering, HMMs, POS tagging, decision trees, and more. Language models, ME modeling, and dimensionality reduction techniques are also explored.

lhyde
Download Presentation

11-761 Language and Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 11-761 Language and Statistics Spring 2016 Roni Rosenfeld http://www.cs.cmu.edu/~roni/11761-s16/

  2. Course Goals and Style • Teaching statistical foundation and techniques for language technologies • Plugging gaping holes in LTI/CS grad student education in probability, statistics and information theory.

  3. Course philosophy • Socratic Method • Will try to maintain in spite of large class size • participation strongly encouraged (pls state your name) • Highly interactive • Highly adaptable • based on how fast we move • Lots of Probability, Statistics, Information theory • not in the abstract, but rather as the need arises • Lectures emphasize intuition, not rigor or detail • background reading will have rigor & detail

  4. Course Prerequisites & Mechanics • You need to be able to program, from scratch. • Largest program is O(100) lines • You need to be comfortable with probabilities • Can you derive Bayes equation in your sleep? • 11661 (masters level): no final project • Hand in assignments via Blackboard • Vigorous enforcement of collaboration & disclosure policy

  5. Background Material No single book exists which covers the course material. • “Foundations of Statistical NLP”, Manning & Schutze • Computational Linguistics perspective • “Statistical Methods in Speech Recognition”, Jelinek • “Text Compression”, Bell, Cleary & Witten • first 4 chapters; rest is mostly text compression • “Probability and Statistics”, DeGroot • “All of Statistics” & “All of nonparametric Statistics”, Wasserman • Lots of individual articles

  6. High Level Syllabus (subject to change) • Language Technology formalisms • source-channel formulation • Bayes classifier • Words, Words, Words • type vs, token, Zipf, Mandlebrot, heterogeneity of langauge • Modeling Word distributions - the unigram: • [estimators, ML, zero frequency, smoothing, shrinkage, G-T] • N-grams: • Deleted Interpolation Model, backoff, toolkit • Measuring Success: perplexity • [entropy, KL-div, MI], the entropy of English, alternatives

  7. Syllabus (continued) • Clustering: • class-based N-grams, hierarchical clustering • hard and soft clustering • Latent Variable Models, EM • Hidden Markov Models, revisiting interpolated and class n-grams • Part-Of-Speech tagging, Word Sense Disambiguation • Decision & Regression Trees • Particularly as applied to language • Stochastic Grammars • (SCFG, inside-outside alg., Link grammar)

  8. Syllabus (continued) • Maximum Entropy Modeling • exponential models, ME principle, feature induction... • Language Model Adaptation • caches, backoff • Dimensionality reduction • latent semantic analysis, word2vec • Syntactic Language Models

More Related