N-Grams
N-Grams. Read J & M Chapter 6, Sections 1, 2, 3 (minus Good-Turing), and 6. Corpora, Types, and Tokens. We now have available large corpora of machine readable texts in many languages. One good source: Project Gutenberg ( http://www.promo.net/pg/ ) We can analyze a corpus into a set of:
423 views • 31 slides