250 likes | 468 Views
Introduction to NLTK. ELN – Natural Language Processing Giuseppe Attardi. Installing NLTK. Download and Install http://nltk.org/install.html Download NLTK data >>> import nltk >>> nltk.download (). NLTK. NLTK. Suite of classes for several NLP tasks Parsing, POS tagging, classifiers…
E N D
Introduction to NLTK ELN – Natural Language Processing Giuseppe Attardi
Installing NLTK • Download and Install • http://nltk.org/install.html • Download NLTK data >>> import nltk >>> nltk.download()
NLTK • Suite of classes for several NLP tasks • Parsing, POS tagging, classifiers… • Several text processing utilities, corpora • Brown, Penn Treebank corpus… • Your data was divided into sentences using ‘punkt’
NLTK • Text material • Raw text • Annotated Text • Tools • Part of speech taggers • Semantic analysis • Resources • WordNet, Treebanks
Linguistic Tasks • Part of Speech Tagging • Parsing • Word Net • Named Entity Recognition • Information Retrieval • Sentiment Analysis • Document Clustering • Topic Segmentation • Authoring • Machine Translation • Summarization • Information Extraction • Spoken Dialog Systems • Natural Language Generation • Word Sense Disambiguation
Part of Speech Tagging • Task: Given a string of words, identify the parts of speech for each word. A man walks into a bar. Det Noun Verb Prep Det Noun
POS Tag Usage • Surface level syntax. • Primary operation • Parsing • Word Sense Disambiguation • Semantic Role labeling • Segmentation • Discourse, Topic, Sentence
How to do it? • Learn from Data. • Annotated Data: A man walks into a bar. Det Noun Verb Prep Det Noun • Unlabeled Data: A man walks home. The pitcher issued four walks.
‘import nltk’ • You will need to import the necessary modules to create objects and call member functions • import ~ include objects from pre-built packages • FreqDist, ConditionalFreqDist are in nltk.probability • PlaintextCorpusReader is in nltk.corpus
Exercise 1. • Run examples from Chapter 1 of NLTK book: • http://nltk.googlecode.com/svn/trunk/doc/book/ch01.html
Exercise 2. • Run examples from Chapter 3 of NLTK book • http://nltk.googlecode.com/svn/trunk/doc/book/ch03.html