170 likes | 266 Views
Old Business. def fact(x ): if(x <= 1): return 1 else: return x * fact(x-1) def binomial_coef(n,k ): return fact(n ) / ( fact(k ) * fact(n-k )) for n in range(9): print [ binomial_coef(n,k ) for k in range(n+1)] [1] [1, 1] [1, 2, 1] [1, 3, 3, 1] [1, 4, 6, 4, 1]
E N D
def fact(x): if(x <= 1): return 1 else: return x* fact(x-1) def binomial_coef(n,k): return fact(n) / (fact(k) * fact(n-k)) for n in range(9): print [binomial_coef(n,k) for k in range(n+1)] [1] [1, 1] [1, 2, 1] [1, 3, 3, 1] [1, 4, 6, 4, 1] [1, 5, 10, 10, 5, 1] [1, 6, 15, 20, 15, 6, 1] [1, 7, 21, 35, 35, 21, 7, 1] [1, 8, 28, 56, 70, 56, 28, 8, 1]
Homework for next time • What are the 10 most common words in • Moby Dick? • Make a concordance of the 3rd most common word • Do the same for • https://jshare.johnshopkins.edu/kchurch4/public_html/teaching/103/Spring2011/ • What are the 10 most common words on this page? • Make a concordance of the 3rd most common word
No need to buy the book • Free online at http://www.nltk.org/book • Read Chapter 1 • http://nltk.googlecode.com/svn/trunk/doc/book/ch01.html • Install NLTK (see next slide) • Warning: It might not be easy (and it might not be your fault) • Let us know how it goes • (both positive and negative responses are more appreciated)
Installing NLTKhttp://nltk.googlecode.com/svn/trunk/doc/book/ch01.html • Chapter 01: pp. 1 - 4 • Python • NLTK • Data
Python Objects Lists Strings >>> sent1[0] 'Call' >>> type(sent1[0]) <type 'str'> >>> sent1[0][0] 'C' >>> sent1[0][1:len(sent1[0])] 'all' >>> sent1 ['Call', 'me', 'Ishmael', '.'] >>> type(sent1) <type 'list'> >>> sent1[0] 'Call' >>> sent1[1:len(sent1)] ['me', 'Ishmael', '.'] First Rest
Types & Tokens Polymorphism Polymorphism
Tokens Types
Tokens FreqDist Types
Works with almost any URL! >>>url="https://jshare.johnshopkins.edu/kchurch4/public_html/teaching/103/Spring2011/Web_Programming_Lecture/WebProgramming/javascript_example_with_sounds.html" >>> def url2text(url): html = urlopen(url).read() raw = nltk.clean_html(html) tokens = nltk.word_tokenize(raw) return nltk.Text(tokens) >>> text=url2text(url) >>> text.concordance('Nonsense')
Hints for Homework import nltk from urllib import urlopen def url2text(url): html = urlopen(url).read() raw = nltk.clean_html(html) tokens = nltk.word_tokenize(raw) return nltk.Text(tokens) url = “https://jshare.johnshopkins.edu/kchurch4/public_html/teaching/103/Spring2011/” t = url2text(url) t.concordance("web”)