170 likes | 190 Views
I256: Applied Natural Language Processing. Marti Hearst Aug 30, 2006. Today. Introductions Python Basics. Introduction to NLTK. The Natural Language Toolkit (NLTK) provides: Basic classes for representing data relevant to natural language processing.
E N D
I256: Applied Natural Language Processing Marti Hearst Aug 30, 2006
Today • Introductions • Python Basics
Introduction to NLTK • The Natural Language Toolkit (NLTK) provides: • Basic classes for representing data relevant to natural language processing. • Standard interfaces for performing tasks, such as tokenization, tagging, and parsing. • Standard implementations of each task, which can be combined to solve complex problems. • Pre-parsed corpora and tools to access them. Slide by Diane Litman
NLTK: Example Modules • nltk_lite.tokenize: processing individual elements of text, such as words or sentences. • nltk_lite.probability: modeling frequency distributions and probabilistic systems. • nltk_lite.tag: tagging tokens with supplemental information, such as parts of speech or wordnet sense tags. • nltk_lite.parser: high-level interface for parsing texts. Slide by Diane Litman
Python and Natural Language Processing • Python is a great language for NLP: • Simple (and fun!) • Powerful string manipulation • Easy to debug: • Interpreted language • Easy to test small steps incrementally • Exceptions • Easy to structure • Modules • Object oriented programming Slide by Diane Litman
An Interpreted Language • The interpreter processes what you’ve typed as soon as you hit <return>: >>> 3 * 4 12 >>> • Python is sensitive to leading whitespace • If you put in extra spaces, or too few, it will complain. • If you type a multi-line command, you must do the indenting; the interpreter helps you with this: >>> if 4 > 3: print "duh” duh >>>
Some Python Basics • Strings
Some Python Basics • Lists
Some Python Basics • Iteration over Lists
Modules and Packages Python modules “package program code and data for reuse.” (Lutz) Similar to library in C, package in Java. Python packages are hierarchical modules (i.e., modules that contain other modules). Three commands for accessing modules: import from…import reload Slide by Diane Litman
Modules and Packages: import • The importcommand loads a module: # Load the regular expression module >>> import re • To access the contents of a module, use dotted names: # Use the search method from the re module >>> re.search(‘\w+’, str) • To list the contents of a module, use dir: >>> dir(re) [‘DOTALL’, ‘I’, ‘IGNORECASE’,…] Slide by Diane Litman
Modules and Packagesfrom…import • The from…import command loads individual functions and objects from a module: # Load the search function from the re module >>> from re import search • Once an individual function or object is loaded with from…import,it can be used directly: # Use the search method from the re module >>> search (‘\w+’, str) Slide by Diane Litman
Import Keeps module functions separate from user functions. Requires the use of dotted names. Works with reload. from…import Puts module functions and user functions together. More convenient names. Does not work with reload. Import vs. from…import Slide by Diane Litman
Modules and Packages: reload • If you edit a module, you must use the reload command before the changes become visible in Python: >>> import mymodule ... >>> reload (mymodule) • The reload command only affects modules that have been loaded with import; it does not update individual functions and objects loaded with from...import. Slide by Diane Litman
Configuring the Python IDE • Called IDLE • You can set key bindings • Go to Options > Configure IDLE • Select Keys tab • Select an action and specify an alternative binding • Click Save as New Custom Key Set • Give it a name • Click Apply so it takes hold • If you want to use an existing binding (say, Control-A) • First find the command that has that binding • Change it to something else • Click Apply • Now choose your command and change it’s binding ot Control-A
For Next Week • Monday: holiday, no class • Sign up for the email list! • Mail to: majordomo@sims.berkeley.edu • Put in msg body: subscribe anlp • For Wed Sept 6 • Finish the programming tutorial • Do the regular expression tutorial. • We’ll go through regex’s some in class.