160 likes | 172 Views
A pair of sometimes useful functions. Function ord returns a character’s ordinance / character code (Unicode) Function chr returns the character with the given character code. >>> ord('ff') Traceback (most recent call last): File "<stdin>", line 1, in ?
E N D
A pair of sometimes useful functions • Function ord returns a character’s ordinance / character code (Unicode) • Function chr returns the character with the given character code >>> ord('ff') Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: ord() expected a character, but string of length 2 found >>> ord('f') 102 >>> ord('.') 46 >>> chr(46) '.'
String searching using find Danish Intelligence Agency Memo Concerning: incident where activists threw red paint at prime minister Anders Fogh Rasmussen Task: improve electronic surveillance to avoid such indicents in the future
Strings are immutable: manipulation methods return new strings Magic: red text surveillance.py Find index of first occurrence of word starting at startindex Print substring around suspicious word without exceeding string
Not all words found, text okay All words found, text is suspicious! ting by Douglas Coupland was sold for 100.000.000 a slight change of plans, the prime minister atten All words found, text is suspicious! fice. He hides the paint behind a plant. Tuesday m him and throw the paint. They keep attacking him , George and Ringo attack him and throw the paint. e paint. They keep attacking him until they're arr y're arrested. The attack should take place at 10a Here's the plan: Paul breaks into Christia the paint behind a plant. Tuesday morning before t surveillancetest.py, output We find words containing a suspicious word: may be important
More string methods: splitlines, join, replace Parents Music Resource Center Concerning: crude language in much of today’s music Task: implement censorship to remove bad words
Split text in list of lines In each line, replace each bad word with BEEP censorship.py If any words were BEEPed, print line and play one beep per word Join censored lines with newlines and return full text
Celine Dion: With each moment, moment pBEEPing by Beeped words: 1 Crime Mob: Ol' stankin BEEP (Hoe) Jank BEEP (Hoe) Suck my BEEP you (Hoe) Ol' fat BEEP (Hoe) But aiight! We finna get these lame BEEP niggaz You see a hoe BEEP nigga, call his BEEP out. Aye! Aye! Stomp his BEEP like (Hoe) Ol' lame BEEP (Hoe) I'ma tell you how it is nigga you betta get the BEEP back cause a nigga like me don't give a BEEP A nigga suppose to gon leave yo BEEP choked You sound like a BEEP yo BEEP I'ma hit we don't give a BEEP cause you is a lame One hitter quitter yo BEEP get popped Back the BEEP up 'fore I show you who reala Whats up wit ya BEEP nigga Ol' sucka BEEP, busta BEEP, cryin to yo momma BEEP I'ma keep up drama I'm a muthaBEEPin plum BEEP See you just a dumb BEEP go on wit yo young BEEP Try me like a sucka but I know you just a lame BEEP In my section they glad to see a nigga that don't give a BEEP Stomp you to the floor and tell you get yo pussy BEEP up Pick that nigga BEEP up, tear his lame BEEP up Niggaz representin Ellenwood time to mBEEP up Throwin blows like Johnny Cage, you think you wanna BEEP wit me Do this BEEP like Pastor Troy Uuh Huh I'm outside hoe Take my BEEPin word I ain't got no reason to lie hoe Beeped words: 34 We find words containing a suspicious word: not desirable here. See exercise. Program tested on two songs by Celine Dion and Crime Mob :
Regular Expressions – Motivation Problem: search suspicious text for any Danish email address: <something>@<something>.dk text1 = "No Danish email here bush@whitehouse.org *@$@.hls.29! fj3a“ text2 = "But here: chili@daimi.au.dk what a *(.@#$ nice @#*.( el ds“ text3 = "And here perhaps? rubbish@junk.garbage@bogus@dk @.dk a@.dk" - Cumbersome using ordinary string methods.
RegExp solution (to be explained later) Text2 contains this Danish email address: chili@daimi.au.dk
Regular Expressions • Provide more efficient and powerful alternative to string search methods • Instead of searching for a specific string we can search for a text pattern • Don’t have to search explicitly for ‘Monday’, ‘Tuesday’, ‘Wednesday’.. : there is a pattern in these search strings. • A regular expression is a text pattern • In Python, regular expression processing capabilities provided by module re
Example Simple regular expression: regExp = “football” - matches only the string “football” To search a text for regExp, we can use re.search( regExp, text )
Compiling Regular Expressions re.search( regExp, text ) • Compile regExp to a special format (an SRE_Pattern object) • Search for this SRE_Pattern in text • Result is an SRE_Match object If we need to search for regExp several times, it is more efficient to compile it once and for all: compiledRE = re.compile( regExp) 1. Now compiledRE is an SRE_Pattern object compiledRE.search( text ) 2. Use search method in this SRE_Pattern to search text 3. Result is same SRE_Match object
Searching for ‘football’ import re text1 = "Here are the football results: Bosnia - Denmark 0-7" text2 = "We will now give a complete list of python keywords." regularExpression = "football" compiledRE = re.compile( regularExpression) SRE_Match1 = compiledRE.search( text1 ) SRE_Match2 = compiledRE.search( text2 ) if SRE_Match1: print "Text1 contains the substring ‘football’" if SRE_Match2: print "Text2 contains the substring ‘football’" Compile regular expression and get theSRE_Patternobject Use the sameSRE_Patternobject to search both texts and get twoSRE_Matchobjects (ornoneif the search was unsuccesful) Text1 contains the substring 'football'
Building more sophisticated patterns Metacharacters: ?: matches zero or one occurrences of the expression it follows +: matches one or more occurrences of the expression it follows *: matches zero or more occurrences of the expression it follows # search for zero or one t, followed by two a’s: regExp1 = “t?aa“ # search for g followed by one or more c’s followed by one a: regExp1 = “gc+a“ #search for ct followed by zero or more g’s followed by one a: regExp1 = “ctg*a“
Use the SRE_Pattern objects to search the text and get SRE_Match objects Text contains the regular expression t?aa Text contains the regular expression gc+a Text contains the regular expression ctg*a