80 likes | 198 Views
CS 311 – Lecture 06 Outline. Regular Expressions Worksheet 1 practice. What is Regexp?. A regular expression is a pattern describing a certain amount of text Used to describe search patterns Useful To extract information from log files To generate reports. Defining regex.
E N D
CS 311 – Lecture 06 Outline • Regular Expressions • Worksheet 1 practice. CS 311 - Operating Systems I
What is Regexp? • A regular expression is a pattern describing a certain amount of text • Used to describe search patterns • Useful • To extract information from log files • To generate reports CS 311 - Operating Systems I
Defining regex • Literal characters and meta-characters are used to describe regex patterns. • Example of literal characters are any alphabet, numbers, symbols. • Example of meta-characters are • [] – define character sets and range • () – grouping patterns and backreferencing • \ - Escape Sequence • ^ - Matches the beginning of a string (or negation when inside []) • $ - Matches the end of a string • | - OR operation • ? – matches 0 or 1 character • * - matches 0 or more characters • + - matches 1 or more characters • . – Matches any character (except newline “\n”) CS 311 - Operating Systems I
Regex Rules • Regex returns the longest possible match. • Usual meta-characters like +,*,.,?,\,|,$ are regular characters inside []. • All characters inside \Q..\E are interpreted as literal characters. CS 311 - Operating Systems I
More meta-characters • \A and \Z – Matches the start and end of line without considering newline (\n). • \b..\b – word boundary to perform whole word search • \d – same as [0-9] (digits) • \w – same as [0-9a-zA-Z_] (alpha-numeric) • {m,n} – repetition (a pattern may occur m to n times) Example: [0-9]{2,3} matches digits that occurs two or three times like 45, 100, etc. [0-9]{2} matches two digits exactly. CS 311 - Operating Systems I
Backreferencing • Patterns within () are captured and stored by the regex engine. • We can access such stored pattern results using \1, \2, \3, etc. • Example: \d([:-])\d\1\d will match 1-2-3 or 4:9:8 and does not match 4-8:9 etc. (\w)([:-])\1\2\w will match A-A-y or 0:0:g etc and does not match A-B:C etc. CS 311 - Operating Systems I
Example • String : yabadabadoo • Pattern • y.*ba – matches yabadaba • a.*da – matches abada • ^b – does not match anything • ^.ba – matches yabadaba • [abd] – matches a,b,a,d,a,b,a,d CS 311 - Operating Systems I
Lets solve the worksheet! CS 311 - Operating Systems I