210 likes | 385 Views
Python Regular Expressions. Easy text processing. Regular Expression. A way of identifying certain String patterns Formally, a RE is: a letter or lambda RE1 RE2 (concatenate 2 RE’s) (RE or RE) (RE)* Why do you think they’re called Regular Expressions?. Python regex. Use the re module
E N D
Python Regular Expressions Easy text processing
Regular Expression • A way of identifying certain String patterns • Formally, a RE is: • a letter or lambda • RE1 RE2 (concatenate 2 RE’s) • (RE or RE) • (RE)* • Why do you think they’re called Regular Expressions?
Python regex • Use the re module • import re • The special characters: . ^ $ * + ? { } [ ] \ | ( ) • We’ll learn them one at a time…
Character classes • [abc] means a or b or c • [a-c] is the same thing • [a-z] = any lowercase letter • [^579] = any character except 5, 7, or 9 For Strings, use |: Shannon|Duvall
Metacharacters • \d any digit [0-9] • \D any non-digit [^0-9] • \s any whitespace character (tabs, return so forth) • \S • \w any alphanumeric character • \W • \b any word boundary • . anything except newline
Repeat • * means 0 or more ma*d matches: md, mad, and maaaaad • + means 1 or more ma+d matches mad and maaaaad but not md • ? means 0 or 1 ma?d matches md and mad only • {x,y} means between x and y repetitions ma{1,3}d matches mad, maad, and maaad
Repeating groups • [ab]* matches a, b, bbb • (ab)* matches ab, abab, ababab
More metacharacters • ^ outside of a character class, means the beginning of a line • $ matches the end of a line
What can I do with them?Search • re.search(pattern, string, <flags>) • pattern is the regex • string is what you are searching in • flags are special modifiers, optional • This either returns None (false) or a Match object • When specifying the regex, use r to denote “raw string”
Search Example import re line = “Cats are smarter than dogs” if re.search(r’.*are.*than.*’,line): print(“yes”)
Groups • Using () in a regex creates a group that can be referenced later. • The string that matches the entire regex is said to be group 0. • Other groups are numbered, starting at 1.
Grouping example import re m = re.search(r'(\w+) (\w+)',"Shannon Lynn Duvall") m.group(0) 'Shannon Lynn’ m.group(1) 'Shannon’ m.group(2) 'Lynn'
Grouping Example • Would it match? m = re.search(r’(\w+) \1’, “Shannon Shannon”) • Space taken out: m = re.search(r’(\w+)\1’, “Shannon Shannon”)
Nested groups • Group number goes from out to in. Count the parentheses. m = re.search(r'(a(b)c)d’, ’’abcd’’) m.group(0) 'abcd’ m.group(1) 'abc’ m.group(2) 'b'
sub: search and replace • re.sub(regex, putIn, string, <flags>) • phone = "1-800-555-9090” • newPhone = re.sub(r'\D', “”, phone) • What is newPhone?
findall • Search for all matches and return them as a list • song ="12 drummers drumming, 11 pipers piping, 10 lords a leaping" • nums = re.findall(r'\d+',) • nums is now [‘12’, ‘11’, ‘10’]
split • Split a string based on a regex as the delimiters. verses = re.split(r'\d+',song) verses is ['', ' drummers drumming, ', ' pipers piping, ', ' lords a leaping']
split with groups • Sometimes you want the delimiter to show up in the list. Use a group – the group will be returned in the list. verses = re.split(r'(\d+)',song) verses is: ['', '12', ' drummers drumming, ', '11', ' pipers piping, ', '10', ' lords a leaping']
Examples • You have a string that represents a poker hand: • a,k,q,j for ace, king, queen, jack • 1-9 for numbers 1-9 • 0 for 10
How would you: • Make sure a string is a valid hand? • Check for a pair of sevens? • Check for any pair? • Check for 3 of a kind? • Check for a full house?