110 likes | 213 Views
CSC1018F: Regular Expressions. Diving into Python Ch. 7 Number Systems. Lecture Outline. Recap of OO Python [week 3] Regular Expressions Standard Verbose Number Systems Binary, decimal, hexadecimal. Recap of OO Python. Object Orientation: Module importing
E N D
CSC1018F:Regular Expressions Diving into Python Ch. 7 Number Systems
Lecture Outline • Recap of OO Python [week 3] • Regular Expressions • Standard • Verbose • Number Systems • Binary, decimal, hexadecimal
Recap of OO Python • Object Orientation: • Module importing • Defining, initializing and instantiating Classes • Class attributes • Class methods • Exceptions • File Handling: • Opening, reading, writing and closing
Intro to Regular Expressions • Regular expressions are a powerful means for parsing text to identify complex patterns of characters • Standard string methods (find, replace, split) can be insufficient in complex cases • But regular expressions can be complicated and difficult to read so avoid them if string methods will do the job • Read regular expressions from left to right • Usage: • Import re # regular expression functionality in re module • Re.sub(regexpr, repstr, inputstr) # typical search & replace
Format of Regular Expressions • Syntax: • $ - end of string marker • ^ - start of string marker • \b - word boundary marker (to avoid backslash escapes use a raw string - r"stringcontents") • ? - optional match to a single character • (A|B|C) - indicates mutually exclusive options A, B and C • Examples: • re.sub(r"\bROAD$", "RD.", addr) • addr: 60 BROAD ROAD 60 BROAD RD. • re.search(r"^(a|b|c) -", question) • question: a - how are you? <SRE_Match object …>
Further Syntax • P{n, m} syntax: • Deals with repeating patterns • Read as pattern P appears at least n times but no more than m times • More syntax: • \d - any numeric digit • \D - any character except a numeric digit • + - 1 or more • * - 0 or more • ( ) - to indicate groups • Examples: • >>> phPat = re.compile(r"^(\d{3})\D*(\d{7})$") • >>> phPat.search(“021 6504058”).groups() • (‘021’, ‘6504058’)
Verbose Regular Expressions • So far only compact regular expressions • To aid readability we would like to include comments and spaces • Use re.VERBOSE as the last arguments to re functions • Whitespace is ignored • Comments ( # commentstr) are ignored • Example: pattern = """ ^ # beginning of string $ # end of string """
Case Study • Counting 1-10 in roman numerals • Additive and subtractive combination of I (=1), V(=5), X (=10) • Can have at most 3 of a particular numeral in a row >>> roman = r"^(I?X|IV|V?I{0,3})$" >>> re.search(roman, "X") <_sre.SRE_Match object at 0x1e55be0> >>> re.search(roman, "VIII") <_sre.SRE_Match object at 0x1e55ba0> >>> re.search(roman, "") <_sre.SRE_Match object at 0x1e55ce0> >>> re.search(roman, "IIII") == None True
Number Systems • Decimal (base 10) • Digits (0-9) • Each place represents a power of ten • 172 = 2*100 + 7*101 + 1*102 = 172 • Binary (base 2) • Digits (0,1) • Each place represents a power of two • 10011 = 1*20 + 1*21 + 0* 22 + 0* 23 + 1* 24 = 19 • Hexadecimal (base 16) • Digits (0-9, A-F) • A-F represent 10-15 • Each place represents a power of sixteen • E.g., F7A = 10*160 + 7* 161 + 15* 162 = 3962
Conversion • Decimal to others • Repeatedly divide number by base and populate places from right to left with the remainder • E.g. Dec2Bin: 50 / 2 [% = 0] = 25 / 2 [% = 1] = 12 / 2 [% = 0] = 6 / 2 [% = 0] = 3 / 2 [% = 1] = 1 / 2 [% = 1] = 0 [110010] • Bin2Hex: • Collect binary digits into groups of four and convert • E.g., 111000011111 = 1110 0001 1111 = E1F • Hex2Bin • Hexadecimal digits convert into groups of four binary digits • E.g., A7C = 1010 0111 1100 = 101001111100 • Hex is used because: • It is easy to convert to and from binary • Offers a more compact representation
Revision Exercise • Create a function which will take a date string in any one of the following formats: • dd/mm/yyyy or dd/mm/yy • Other separators (e.g., ‘\’, ‘ ‘, ‘-’) are also allowed • Single figure entries may have the form x or 0x, e.g. 3/4/5 or 03/04/05 • dd month yy or yyyy where month may be written in full (December) or abbreviated (Dec. or Dec) • And return it in the format: • dd month(in full) yyyy, e.g. 13 March 2006 • Implement this using regular expressions and also implement range checking on dates