1 / 11

CSC1018F: Regular Expressions

CSC1018F: Regular Expressions. Diving into Python Ch. 7 Number Systems. Lecture Outline. Recap of OO Python [week 3] Regular Expressions Standard Verbose Number Systems Binary, decimal, hexadecimal. Recap of OO Python. Object Orientation: Module importing

estebanb
Download Presentation

CSC1018F: Regular Expressions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC1018F:Regular Expressions Diving into Python Ch. 7 Number Systems

  2. Lecture Outline • Recap of OO Python [week 3] • Regular Expressions • Standard • Verbose • Number Systems • Binary, decimal, hexadecimal

  3. Recap of OO Python • Object Orientation: • Module importing • Defining, initializing and instantiating Classes • Class attributes • Class methods • Exceptions • File Handling: • Opening, reading, writing and closing

  4. Intro to Regular Expressions • Regular expressions are a powerful means for parsing text to identify complex patterns of characters • Standard string methods (find, replace, split) can be insufficient in complex cases • But regular expressions can be complicated and difficult to read so avoid them if string methods will do the job • Read regular expressions from left to right • Usage: • Import re # regular expression functionality in re module • Re.sub(regexpr, repstr, inputstr) # typical search & replace

  5. Format of Regular Expressions • Syntax: • $ - end of string marker • ^ - start of string marker • \b - word boundary marker (to avoid backslash escapes use a raw string - r"stringcontents") • ? - optional match to a single character • (A|B|C) - indicates mutually exclusive options A, B and C • Examples: • re.sub(r"\bROAD$", "RD.", addr) • addr: 60 BROAD ROAD  60 BROAD RD. • re.search(r"^(a|b|c) -", question) • question: a - how are you?  <SRE_Match object …>

  6. Further Syntax • P{n, m} syntax: • Deals with repeating patterns • Read as pattern P appears at least n times but no more than m times • More syntax: • \d - any numeric digit • \D - any character except a numeric digit • + - 1 or more • * - 0 or more • ( ) - to indicate groups • Examples: • >>> phPat = re.compile(r"^(\d{3})\D*(\d{7})$") • >>> phPat.search(“021 6504058”).groups() • (‘021’, ‘6504058’)

  7. Verbose Regular Expressions • So far only compact regular expressions • To aid readability we would like to include comments and spaces • Use re.VERBOSE as the last arguments to re functions • Whitespace is ignored • Comments ( # commentstr) are ignored • Example: pattern = """ ^ # beginning of string $ # end of string """

  8. Case Study • Counting 1-10 in roman numerals • Additive and subtractive combination of I (=1), V(=5), X (=10) • Can have at most 3 of a particular numeral in a row >>> roman = r"^(I?X|IV|V?I{0,3})$" >>> re.search(roman, "X") <_sre.SRE_Match object at 0x1e55be0> >>> re.search(roman, "VIII") <_sre.SRE_Match object at 0x1e55ba0> >>> re.search(roman, "") <_sre.SRE_Match object at 0x1e55ce0> >>> re.search(roman, "IIII") == None True

  9. Number Systems • Decimal (base 10) • Digits (0-9) • Each place represents a power of ten • 172 = 2*100 + 7*101 + 1*102 = 172 • Binary (base 2) • Digits (0,1) • Each place represents a power of two • 10011 = 1*20 + 1*21 + 0* 22 + 0* 23 + 1* 24 = 19 • Hexadecimal (base 16) • Digits (0-9, A-F) • A-F represent 10-15 • Each place represents a power of sixteen • E.g., F7A = 10*160 + 7* 161 + 15* 162 = 3962

  10. Conversion • Decimal to others • Repeatedly divide number by base and populate places from right to left with the remainder • E.g. Dec2Bin: 50 / 2 [% = 0] = 25 / 2 [% = 1] = 12 / 2 [% = 0] = 6 / 2 [% = 0] = 3 / 2 [% = 1] = 1 / 2 [% = 1] = 0 [110010] • Bin2Hex: • Collect binary digits into groups of four and convert • E.g., 111000011111 = 1110 0001 1111 = E1F • Hex2Bin • Hexadecimal digits convert into groups of four binary digits • E.g., A7C = 1010 0111 1100 = 101001111100 • Hex is used because: • It is easy to convert to and from binary • Offers a more compact representation

  11. Revision Exercise • Create a function which will take a date string in any one of the following formats: • dd/mm/yyyy or dd/mm/yy • Other separators (e.g., ‘\’, ‘ ‘, ‘-’) are also allowed • Single figure entries may have the form x or 0x, e.g. 3/4/5 or 03/04/05 • dd month yy or yyyy where month may be written in full (December) or abbreviated (Dec. or Dec) • And return it in the format: • dd month(in full) yyyy, e.g. 13 March 2006 • Implement this using regular expressions and also implement range checking on dates

More Related