320 likes | 641 Views
Python Code Examples. Word Spotting. import sys fname1 = "c:Python Courseex1.txt" for line in open(fname1,'r').readlines(): for word in line.split(): if word.endswith('ing'): print word. Creating a Dictionary of First Names. def createNameDict():
E N D
Word Spotting import sys fname1 = "c:\Python Course\ex1.txt" for line in open(fname1,'r').readlines(): for word in line.split(): if word.endswith('ing'): print word
Creating a Dictionary of First Names def createNameDict(): dictNameFile=open('project/dictionaries/names.txt','r') dictContent=dictNameFile.read() #read all the file dictWords=dictContent.split(",") #return a list with the words nameDict={} # initialize a dictionary for word in dictWords: nameDict[word.strip()]=" " #enters each word to the dctionary. return nameDict
Computing Accuracy Results I # anfiles.py # Program to analyze the results of speaker identification. # Illustrates Python dictionarys import string, glob, sys def main(): # read correct file and test file fname1 = sys.argv[1] fname2 = sys.argv[2] text1 = open(fname1,'r').read() text1 = string.lower(text1) words1 = string.split(text1) correct_len = len(words1) text2 = open(fname2,'r').read() text2 = string.lower(text2) words2 = string.split(text2)
Computing Accuracy Results II # construct a dictionary of correct results correct = {} for w in words1: correct[w] = 1 for i in range(correct_len): in_count = 0 portion2 = words2[:i+1] for w in portion2: if correct.get(w,0) > 0: in_count+=1 accuracy = float(in_count)/float(len(portion2)) print "%5d, %5d,%.2f" % (len(portion2), in_count, accuracy) if __name__ == '__main__': main()
Word Histograms import sre, string pattern = sre.compile( r'[a-zA-Z]+' ) def countwords(text): dict = {} try: iterator = pattern.finditer(text) for match in iterator: word = match.group() try: dict[word] = dict[word] + 1 except KeyError: dict[word] = 1 except sre.error: pass # triggers when first index goes to -1, terminates loop.
Word Histograms items = [] for word in dict.keys(): items.append( (dict[word], word) ) items.sort() items.reverse() return items # if run as a script, count words in stdin. if __name__ == "__main__": import sys x = countwords( sys.stdin.read() ) s = map(str, x) t = string.joinfields(s, "\n") print t
Extracting People Names and Company Names import string, sre, glob, sys def createNameDict(): dictNameFile=open('names.txt','r') dictContent=dictNameFile.read() #read all the file dictWords=dictContent.split(",") #return a list with the words nameDict={} # initialize a dictionary for word in dictWords: nameDict[word.strip()]=" " #enters each word to the dctionary. return nameDict def main(): # read file fname1 = sys.argv[1] text1 = open(fname1,'r').read() namesDic = createNameDict() CompanySuffix = sre.compile(r'corp | ltd | inc | corporation | gmbh | ag | sa ', sre.IGNORECASE) pattern = sre.compile( r'([A-Z]\w+[ .,-]+)+'
Extracting People Names and Company Names r'(corp|CORP|Corp|ltd|Ltd|LTD|inc|Inc|INC|corporation|Corporation|CORPORATION|gmbh|GMBH|ag|AG|sa|SA)' r'(\.?)') pattern1 = sre.compile( r'([A-Z]\w+[\s.-]*){2,4}' ) #Companies capitalWords=sre.finditer(pattern,text1) for match in capitalWords: CapSeq = match.group() print CapSeq #People capitalWords1=sre.finditer(pattern1,text1) for match in capitalWords1: wordList=match.group().split() #check name in names dictionary if namesDic.has_key(wordList[0].strip()): print match.group() if __name__ == '__main__': main()
NLTK • NLTK defines a basic infrastructure that can be used to build NLP programs in Python. It provides: • Basic classes for representing data relevant to natural language processing. • Standard interfaces for performing tasks, such as tokenization, tagging, and parsing. • Standard implementations for each task, which can be combined to solve complex problems. • Extensive documentation, including tutorials and reference documentation.
RE Show >>> from nltk.util import re_show >>> string = """ ... It’s probably worth paying a premium for funds that invest in markets ... that are partially closed to foreign investors, such as South Korea, ... ... """ >>> re_show(’t...’, string) I{t’s }probably wor{th p}aying a premium for funds {that} inves{t in} markets {that} are par{tial}ly closed {to f}oreign inves{tors}, such as Sou{th K}orea, ... >>>
Defining Classes >>> class SimpleClass: ... def __init__(self, initial_value): ... self.data = initial_value ... def set(self, value): ... self.data = value ... def get(self): ... print self.data ... >>> x = SimpleClass(4)
Inheritance B is a subclass of A >>> class B(A): ... def __init__(self): SimpleTokenizer implements the interface of TokenizerI >>> class SimpleTokenizer(TokenizerI): ... def tokenize(self, str): ... words = str.split() ... return [Token(words[i], Location(i)) ... for i in range(len(words))]
Inheritance Example class point: def __init__(self, x=0, y=0): self.x, self.y = x, y class cartesian(point): def distanceToOrigin(self): return floor(sqrt(self.x**2 + self.y**2)) class manhattan(point): def distanceToOrigin(self): return self.x + self.y
Sets in Python • The sets module provides classes for constructing and manipulating unordered collections of unique elements. Common uses include: • membership testing, • removing duplicates from a sequence, • and computing standard math operations on sets such as intersection, union, difference, and symmetric difference. • Like other collections, sets support x in set, len(set), and for x in set. Being an unordered collection, sets do not record element position or order of insertion. Accordingly, sets do not support indexing, slicing, or other sequence-like behavior.
Some Details about Implementation • Most set applications use the Set class which provides every set method except for __hash__(). For advanced applications requiring a hash method, the ImmutableSet class adds a __hash__() method but omits methods which alter the contents of the set. • The set classes are implemented using dictionaries. As a result, sets cannot contain mutable elements such as lists or dictionaries. • However, they can contain immutable collections such as tuples or instances of ImmutableSet. • For convenience in implementing sets of sets, inner sets are automatically converted to immutable form, for example, Set([Set(['dog'])]) is transformed to Set([ImmutableSet(['dog'])]).
Set Examples >>> from sets import Set >>> engineers = Set(['John', 'Jane', 'Jack', 'Janice']) >>> programmers = Set(['Jack', 'Sam', 'Susan', 'Janice']) >>> managers = Set(['Jane', 'Jack', 'Susan', 'Zack']) >>> employees = engineers | programmers | managers # union >>> engineering_management = engineers & managers # intersection >>> fulltime_management = managers - engineers - programmers # difference >>> engineers.add('Marvin') # add element >>> print engineers Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack']) >>> employees.issuperset(engineers) # superset test False
Set Examples >>> employees.union_update(engineers) # update from another set >>> employees.issuperset(engineers) True >>> for group in [engineers, programmers, managers, employees]: ... group.discard('Susan') # unconditionally remove element ... print group ... Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack']) Set(['Janice', 'Jack', 'Sam']) Set(['Jane', 'Zack', 'Jack']) Set(['Jack', 'Sam', 'Jane', 'Marvin', 'Janice', 'John', 'Zack'])
Google API • Get it from http://sourceforge.net/projects/pygoogle/ • A Python wrapper for the Google web API. Allows you to do Google searches, retrieve pages from the Google cache, and ask Google for spelling suggestions.
Utilizing the Google API - I import sys import string import codecs import google print '<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">' print '<head>' print ' <title>Google with Python</title>' print '</head>' print '<body>' print '<h1>Google with Python</h1>' google.LICENSE_KEY = '[YOUR GOOGLE LICENSE KEY]' sys.stdout = codecs.lookup('utf-8')[-1](sys.stdout) query = “Your Query" data = google.doGoogleSearch(query)
Utilizing the Google API - II print '<p><strong>1-10 of "' + query + '" total results for ' print str(data.meta.estimatedTotalResultsCount) + '</strong></p>' for result in data.results: title = result.title title = title.replace('<b>', '<strong>') title = title.replace('</b>', '</strong>') snippet = result.snippet snippet = snippet.replace('<b>','<strong>') snippet = snippet.replace('</b>','</strong>') snippet = snippet.replace('<br>','<br />') print '<h2><a href="' + result.URL + '">' + title + '</a></h2>' print '<p>' + snippet + '</p>' print '</body>‘ print '</html>'
Yahoo API • http://pysearch.sourceforge.net/ • http://python.codezoo.com/pub/component/4193?category=198 • This project implements a Python API for the Yahoo Search Webservices API. pYsearch is an OO abstraction of the web services, with emphasis on ease of use and extensibility.
URLLIB • This module provides a high-level interface for fetching data across the World Wide Web. • In particular, the urlopen() function is similar to the built-in function open(), but accepts Universal Resource Locators (URLs) instead of filenames. • Some restrictions apply -- it can only open URLs for reading, and no seek operations are available.
Urllib Syntax • # Use http://www.someproxy.com:3128 for http proxying proxies = {'http': 'http://www.someproxy.com:3128'} filehandle = urllib.urlopen(some_url, proxies=proxies) • # Don't use any proxies filehandle = urllib.urlopen(some_url, proxies={})
URLLIB Examples • Here is an example session that uses the "GET" method to retrieve a URL containing parameters: >>> import urllib >>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) >>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params) >>> print f.read() • The following example uses the "POST" method instead: >>> import urllib >>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) >>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params) >>> print f.read()
What is a Proxy • A proxy server is a computer that offers a computer network service to allow clients to make indirect network connections to other network services. • A client connects to the proxy server, then requests a connection, file, or other resource available on a different server. • The proxy provides the resource either by connecting to the specified server or by serving it from a cache. • In some cases, the proxy may alter the client's request or the server's response for various purposes. • A proxy server can also serve as a firewall.